POLYPEPTIDES HAVING A FUNCTIONAL DOMAIN OF INTEREST 
AND METHODS OF IDENTIFYING AND USING SAME 



This application is a continuation-in-part of co- 
5 pending U.S. Patent Application Serial No. 08/417,872 filed 
April 7, 1995 , the entire contents of which are incorporated 
herein by reference. 

1. I n t r oduc t i on 

10 The present invention is directed to polypeptides 

having a functional domain of interest or functional 
equivalents thereof. Methods of identifying these 
polypeptides are described, along with various methods of 
their use, including but not limited to targeted drug 

15 discovery. 

2 . Background of the Invention 

Combinatorial libraries represent exciting new tools 
in basic science research and drug design. It is possible 

20 through synthetic chemistry or molecular biology to generate 
libraries of complex polymers, with many subunit permutations. 
There are many guises to these libraries: random peptides, 
which can be synthesized on plastic pins (Geysen et al. , 1987, 
J. Immunol. Meth. 102:259-274), beads (Lai et al., 1991, 

25 Nature 354:82-84) or in a soluble form (Houghten et al., 1991, 
Nature 354:84-86) or expressed on the surface of viral 
particles (Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA 
87:6378-6382; Kay et al., 1993, Gene 128:59-65; Scott and 
Smith, 1990, Science 249:386-390); nucleic acids (Ellington 

30 and Szostak, 1990, Nature 346:818-822; Gao et al., 1994, Proc. 
Natl. Acad. Sci. USA 91: 11207-11211; Tuerk and Gold, 1990, 
Science 249:505-510); and small organic molecules (Gordon et 
al., 1994, J. Med. Chem. 37:1385-1401). These libraries are 
very useful in mapping protein-protein interactions and 

35 discovering drugs. 

Phage display has become a powerful method for 
screening populations of peptides, mutagenized proteins, and 



EV245495899US 



cDNAs for members that have affinity to target molecules of 
interest. It is possible to generate 10 8 -10 9 different 
recombinants from which one or more clones can be selected 
with affinity to antigens , antibodies, cell surface receptors, 
5 protein chaperones, DNA, metal ions, etc. Screening libraries 
is versatile because the displayed elements are expressed on 
the surface of the virus as capsid-fusion proteins. The most 
important consequence of this arrangement is that there is a 
physical linkage between phenotype and genotype. There are 

10 several other advantages as well: 1) virus particles which 

have been isolated from libraries by affinity selection can be 
regenerated by simple bacterial infection, and 2) the primary 
structure of the displayed binding peptide or protein can be 
easily deduced by DNA sequencing of the cloned segment in the 

15 viral genome. 

Combinatorial peptide libraries have been expressed 
in bacteriophage. Synthetic oligonucleotides, fixed in 
length, but with multiple unspecified codons can be cloned 
into genes III, VI, or VIII of bacteriophage M13 where they 

20 are expressed as a plurality of peptide: capsid fusion 

proteins. The libraries, often referred to as random peptide 
libraries, can be screened for binding to target molecules of 
interest. Usually, three to four rounds of screening can be 
accomplished in a week's time, leading to the isolation of one 

25 to hundreds of binding phage. 

The primary structure of the binding peptides is 
then deduced by nucleotide sequencing of individual clones. 
Inspection of the peptide sequences sometimes reveals a common 
motif, or consensus sequence. Generally, this motif when 

30 synthesized as a soluble peptide has the full binding 

activity. Random peptide libraries have successfully yielded 
peptides that bind to the Fab site of antibodies (Cwirla et 
al., 1990, Proc. Natl. Acad. Sci. USA 87:6378-6382; Scott and 
Smith, 1990, Science 249 : 386-390) , cell surface receptors 

35 (Doorbar and Winter, 1994, J. Mol/ Biol. 244:361-369; Goodson 
et al., 1994, Proc. Natl. Acad. Sci. USA 91:7129-7133), 
cytosolic receptors (Blond-Elguindi et al., 1993, Cell 75:717- 
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728) ^intracellular proteins (Daniels and Lane, 1994, J. Mol. 
Biol. 243:639-652; Dedman et al. # 1993, J. Biol. Chem. 
268:23025-23030; Sparks et al. , 1994, J. Biol. Chem. 
269:23853-23856), DNA (Krook et al., 1994, Biochem. Biophys. 
5 Res. Comm. 204:849-854), and many other targets (Winter, 1994, 
Drug Dev. Res. 33:71-89). 

Most vital cellular processes are regulated by the 
transmission of signals throughout the cell in the form of 
complex interactions between proteins. As the study of signal 

10 transduction, or the flow of information throughout the cell, 
has broadened and matured, it has become apparent that these 
protein-protein interactions are often mediated by modular 
domains within signalling proteins. Src, both the first 
proto-oncogene product and the first tyrosine kinase 

15 discovered (Taylor and Shalloway, 1993, Current Opinion in 
Genetics and Development 3:26-34), is the prototypic modular 
domain-containing protein. 

Src is a protein tyrosine kinase of 60 kilodaltons 
and is located at the plasma membrane of cells. It was first 

20 discovered in the 1970 f s to be the oncogenic element of Rous 
sarcoma virus, and in the 1980 f s, it was appreciated to be a 
component of the signal transduction system in animal cells. 
However, since the identification of viral and cellular forms 
of Src (i.e., v-Src and c-Src) , their respective roles in 

25 oncogenesis, normal cell growth, and differentiation have not 
been completely understood. 

In addition to its tyrosine kinase region (sometimes 
called a Src Homology 1 domain) , Src contains two regions that 
have been found to have functionally and structurally 

30 homologous counterparts in a large number of proteins. These 
regions have been designated the Src Homology 2 (SH2) and Src 
Homology 3 (SH3) domains. SH2 and SH3 domains are modular in 
that they fold independently of the protein that contains 
them, their secondary structure places N-and C-termini close 

35 to one another in space, and they ^appear at variable locations 
(anywhere from N-to C-terminal) from one protein to the next 
(Cohen et al., 1995, Cell 80:237-248). SH2 domains have been 
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well-studied and are known to be involved in binding to 
phosphorylated tyrosine residues (Pawson and Gish / 1992 , Cell 
71:359-362) . 

The Src-homology region 3 (SH3) of Src is a domain 
5 that is 60-70 amino acids in length and is present in many 
cellular proteins (Cohen et al., 1995, Cell 80:237-248; 
Pawson, 1995, Nature 373:573-580). Within Src, the SH3 domain 
is considered to be a negative inhibitory domain, because c- 
Src can be activated (i.e., transforming) through mutations in 

10 this domain (Jackson et al., 1993, Oncogene 8:1943-1956; 
Seidel-Dugan et al. , 1992, Mol Cell Biol 12:1835-1845). 

To deduce the binding specificity of the Abl SH3 
domain, a group led by David Baltimore screened cDNA libraries 
with radiolabeled GST-Abl SH3 fusion protein and identified 

15 two binding cDNA clones (Cicchetti et al., 1992, Science 

257:803-806). Both clones encoded proteins with proline rich 
regions that were later shown to be SH3 binding domains. 

Subsequently, others have screened combinatorial 
peptide libraries and identified peptides that bound to the 

20 Src SH3 domain (Yu et al., 1994, Cell 76:933-945; Cheadle et 
al., 1994, J. Biol. Chem. 269:24034-24039). Using the SH3 
domain of Src, Sparks et al., 1994, J. Biol. Chem. 269:23853- 
23856 screened phage-display random peptide libraries and 
identified a consensus peptide sequence that binds with 

25 specificity and high affinity to the Src SH3 domain. 

The consensus from these various studies is that the 
optimal Src SH3 peptide ligand is RPLPPLP (SEQ ID NO: 45) . 
Recently, the structures of the peptide-SH3 domain complexes 
have been deduced by NMR and the peptides have been shown to 

30 bind in two possible orientations with respect to the SH3 

domain (Feng et al., 1994, Science 266:1241-1247; Lim et al., 
1994, Nature 372:375-379). 

Since SH3 domains have been found to have such 
important roles in the function of crucial signalling and 

35 structural elements in the cell, a* method of identifying 

proteins containing SH3 regions is of great interest. In this 
regard, it is important to note that such a method is 
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unavailable because of the low sequence similarity of modular 
functional domains, including SH3 • See, e.g., Figure 6, which 
illustrates the minimal primary sequence homology among 
various known SH3 domains. 
5 Sequence homology searches can potentially identify 

known proteins containing not yet recognized functional 
domains of interest, however, sequence homology generally 
needs to be >40% for this procedure to be successful. 
Functional domains generally are less than 40% homologous and 

10 therefore many would be missed in a sequence homology search. 
In addition, homology searches do not identify novel proteins; 
they only identify proteins already defined by nucleotide or 
amino acid sequence and present in the database. 

Another approach is to use hybridization techniques 

15 using nucleotide probes to search expression libraries for 
novel proteins. This method would have limited applicability 
to finding novel proteins containing functional domains due to 
the low sequence homology of the functional domains. 

Methods for isolating partner proteins involved in 

20 protein-protein interactions have generally focused on finding 
a ligand to a protein that has been found arid characterized. 
Such approaches have included using anti-idiotypic antibodies 
that mimic the known protein to screen cDNA expression 
libraries for a binding ligand (Jerne, 1974, Ann. Immunol. 

25 (Inst. Pasteur) 125c: 373-389; Sudol, 1994, Oncogene 9:2145- 
2152). Skolnick et al., 1991, Cell 65:83-90 isolated a 
binding partner for PI3 -kinase by screening a cDNA expression 
library with the 32 P-labeled tyrosine phosphorylated carboxyl 
terminus of the epidermal growth factor receptor (EGFR) . 

30 An easy method for isolating operationally defined 

ligands involved in protein-protein interactions and for 
optimally identifying an exhaustive set of modular domain- 
containing proteins implicated in binding with the ligands 
would be highly desirable. 

35 If such a method were available, however, such a 

method would be useful for the isolation of any polypeptide 
having a functioning version of any functional domain of 
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interest. Such a general method would be of tremendous 
utility in that whole families of related proteins each wittT 
its own version of the functional domain of interest could be 
identified. Knowledge of such related proteins would 
5 contribute greatly to our understanding of various 

physiological processes, including cell growth or death, 
malignancy, and immune reactions, to name a few. Such a 
method would also contribute to the development of 
increasingly more effective therapeutic, diagnostic, or 

10 prophylactic agents having fewer side effects. 

According to the present invention, just such a 
method is provided. 

Regarding SH3 domain-containing proteins, the method 
of the present invention will contribute greatly to our 

15 understanding of cell growth (Zhu et al. , 1993, J. Biol. Chem. 
268:1775-1779; Taylor and Shalloway, 1994, Nature 3 68:867- 
871), malignancy (Wages et al., 1992, J. Virol. 66:1866-1874; 
Bruton and Workman, 1993, Cancer Chemother. Pharmacol. 32:1- 
19) , subcellular localization of proteins to the cytoskeleton 

20 and/or cellular membranes (Weng et al., 1993, «J. Biol. Chem. 
268:14956-14963; Bar-Sagi et al. , 1993, Cell 74:83-91), signal 
transduction (Duchesne et al., 1993, Science 259:525-528), 
cell morphology (Wages et al., 1992, J. Virol. 66:1866-1874; 
McGlade et al., 1993, EMBO J. 12:3073-3081), neuronal 

25 differentiation Tanaka et al., 1993, Hoi. Cell. Biol. 13:4409- 
4415), T cell activation (Reynolds et al., 1992, Oncogene 
7:1949-1955), and cellular oxidase activity (McAdara and 
Babior, 1993, Blood 82:A28). 

30 Citation of a reference hereinabove shall not be 

construed as an admission that such is prior art to the 
present invention. 

3. SUMMARY OF THE INVENTION 
35 In general, the present invention is directed to a 

method of using isolated, operationally defined ligands 
involved in binding interactions for optimally identifying an 
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exhaustive set of compounds binding to such ligands. In one 
embodiment , the isolated ligands are peptides involved in 
specific protein-protein interactions and are used to identify 
a set of novel modular domain-containing proteins that bind to 
5 the ligands. Using this method, proteins sharing only modest 
similarities but a common function can be found. 

The present invention is directed to a method of 
identifying a polypeptide or family of polypeptides having a 
functional domain of interest. The basic steps of the method 

10 comprise: (a) choosing a recognition unit or set of 

recognition units having a selective affinity for a target 
molecule with a functional domain of interest; (b) contacting 
the recognition unit with a plurality of polypeptides; and 
(c) identifying a polypeptide having a selective binding 

15 affinity for the recognition unit, which polypeptide includes 
the functional domain of interest or a functional equivalent 
thereof . 

In one particular embodiment of the invention, 
exhaustive screening of proteins having a desired functional 

2 0 domain involves an iterative process by which ligands or 
recognition units for SH3 domains identified in the first 
round of screening are used to detect SH3 domain-containing 
proteins in successive expression library screens. 

More particularly, the method of the present 

25 invention includes choosing a recognition unit having a 
selective affinity for a target molecule with a functional 
domain of interest. With this recognition unit (particularly 
under the multvalent recognition unit screening conditions 
taught by the present invention) , it has further been 

30 discovered that a plurality of polypeptides from various 

sources can be examined such that certain polypeptides having 
a selective binding affinity for the recognition unit can be 
identified. The polypeptides so identified have been shown to 
include the functional domain of interest; that is, the 

35 functional domains found are workihg versions that are capable 
of displaying the same binding specificity as the functional 
domain of interest. Hence, the polypeptides identified by the 
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present method also possess those attributes of the functional 
domain of interest which allow these related polypeptides to 
exhibit the same, similar , or analogous (but functionally 
equivalent) selective affinity characteristics as the domain 
5 of interest of the initial target molecule. By screening the 
plurality of peptides for recognition unit binding, the 
methods of the present invention circumvent the limitations of 
conventional DNA-based screening methods and allow for the 
identification of highly disparate protein sequences 

10 possessing functionally equivalent functional domains. 

In specific embodiments of the present invention, 
the plurality of polypeptides is obtained from the proteins 
present in a cDNA expression library. The specificity of the 
polypeptides which bear the functional domain of interest or a 

15 functional equivalent thereof for various peptides or 

recognition units can subsequently be examined, allowing for a 
greater understanding of the physiological role of particular 
polypeptide/recognition unit interactions. Indeed, the 
present invention provides a method of targeted drug discovery 

2 0 based on the observed effects of a given drug candidate on the 
interaction between a recognition unit-polypeptide pair or a 
recognition unit and a "panel" of related polypeptides each 
with a copy or a functional equivalent of (e.g., capable of 
displaying the same binding specificity and thus binding to 

25 the same recognition unit as) the functional domain of 
interest. 

The present invention also provides polypeptides 
comprising certain amino acid sequences. Moreover, the 
present invention also provides nucleic acids, including 

30 certain DNA constructs comprising certain coding sequences. 

Using the methods of the present invention, more than eighteen 
different SH3 domain-containing proteins have been identified, 
over half of which have not been previously described. 

The present inventors have found, unexpectedly, that 

35 the valency (i.e., whether it is a % monomer, dimer, tetramer, 
etc.) of the recognition unit that is used to screen an 
expression library or other source of polypeptides apparently 
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has ar marked effect upon the specificity of the recognition 
unit-functional domain interaction. The present inventors 
have discovered that recognition units in the form of small 
peptides, in multivalent form, have a specificity that is 
5 eased but not forfeited. In particular , biotinylated peptides 
bound to a multivalent (believed to be tetravalent) 
streptavidin-alkaline phosphatase complex have an unexpected 
generic specificity. This allows such peptides to be used to 
screen libraries to identify classes of polypeptides 

XO containing functional domains that are similar but not 
identical in sequence to the peptides 1 original target 
functional domains. 

The present invention also provides methods for 
identifying potential new drug candidates (and potential lead 

15 compounds) and determining the specificities thereof. For 
example, knowing that a polypeptide with a functional domain 
of interest and a recognition unit, e.g., a binding peptide, 
exhibit a selective affinity for each other, one may attempt 
to identify a drug that can exert an effect on the 

2 0 polypeptide-recognition unit interaction, e.g., either as an 
agonist or as an antagonist (inhibitor) of the interaction. 
With this assay, then, one can screen a collection of 
candidate "drugs" for the one exhibiting the most desired 
characteristic, e.g., the most efficacious in disrupting the 

25 interaction or in competing with the recognition unit for 
binding to the polypeptide. 

In addition, the present invention also provides 
certain assay kits and methods of using these assay kits for 
screening drug candidates for their ability to affect the 

30 binding of a polypeptide containing a functional domain to a 
recognition unit. In a particular aspect of the present 
invention, the assay kit comprises: (a) a polypeptide 
containing a functional domain of interest; and (b) a 
recognition unit having a selective binding affinity for the 

35 polypeptide. Yet another assay kit may comprise a plurality 
of polypeptides, each polypeptide containing a functional 
domain of interest, in which the functional domain of interest 
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is a domain selected from the group consisting of an SHI, SH2, 
SH3, PH, PTB, LIM, armadillo, Notch/ankyrin repeat, zinc 
finger, leucine zipper, and helix-turn-helix, and at least one 
recognition unit having a selective affinity for each of the 
5 plurality of polypeptides* 

Other objects of the present invention will be 
apparent to those of ordinary skill upon further consideration 
of the following detailed description. 

10 4. DESCRIPTION OF THE FIGURES 

Figure 1 is a schematic representation of the 
general aspects of a method of identifying recognition units 
exhibiting a selective affinity for a target molecule with a 
functional domain of interest. In this illustration, the 
15 target molecule is a polypeptide with an SH3 domain, and the 
recognition units are peptides having a selective affinity for 
the SH3 domain that are expressed in a phage displayed 
library. 

20 Figure 2 illustrates the selectivities exhibited by 

particular recognition units that bind to the Src SH3 domain 
(in this case, two heptapeptides) for a "panel" of known 
polypeptides known to contain an SH3 domain. The non-SH3- 
containing protein, GST, serves as control. RPLPPLP is (SEQ 

25 ID NO:45); APPVPPR is (SEQ ID NO:203) 

Figure 3 is a schematic representation of the 
general method of identifying polypeptides with a functional 
domain of interest by screening a plurality of polypeptides 
30 using a suitable recognition unit. In the illustration, the 
plurality of polypeptides is obtained from a cDNA expression 
library, and the recognition units are SH3 domain-binding 
peptides. 

35 Figure 4 illustrates how an SH3 domain--binding 

peptide can be used to identify other SH3 domain-containing 
proteins. Shown is a schematic representation of the 
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progression from initial selection of a target molecule with a 
functional domain of interest , choice of recognition unit, and 
identification of polypeptides that have a selective affinity 
for the recognition unit and include the functional domain of 
5 interest or a functional equivalent thereof. 

Figure 5 depicts filters from primary (Figure 5B) 
and tertiary (Figure 5A) screens of a XcDNA library probed 
with a biotinylated SH3 -binding peptide recognition unit in 

10 the form of a complex with streptavidin-alkaline phosphatase 
(SA-AP) A mouse 16 day embryo cDNA library in \BXlox was 
incubated with a multivalent complex formed between 
biotinylated pSrcCII and SA-AP. The sites of peptide binding 
were detected by incubation with BCIP (5-bromo-4-chloro-3- 

15 indoyl-phosphate-p-toluidine salt) and NBT (nitroblue 
tetrazolium chloride) for approximately five minutes. 

Figure 6 shows an alignment of SH3 domains that 
illustrates the minimal primary sequence homology among 
20 various known SH3 domains. The amino acid sequences shown are 
SEQ ID NOs: 68-111. 

Figure 7A is a schematic representation of a 
population of functional domains represented by the circles. 

25 "A" is a recognition unit specific to one circle only. B, on 
the other hand, recognizes three domains, while Bl and B2 
recognize only two each. Figure 7B illustrates an iterative 
method whereby new recognition units are chosen based on 
polypeptides uncovered with the first recognition unit(s) . 

3 0 These new recognition units lead to the identification of 

other related polypeptides, etc., expanding the scope of the 
study to increasingly diverse members of the related 
population. 

35 Figure 8 illustrates the 'binding specificity of 

several SH3 domain recognition units. Biotinylated Class I 
(pSrcCI) or Class II (pSrcCII) Src SH3 domain recognition 
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units/ Crk SH3 domain recognition units (pCrk) , PLC7 SH3 
domain recognition units (pPLC) , and Abl SH3 domain 
recognition units (pAbl) were tested for binding to the 
indicated GST-SH3 domain fusion proteins immobilized onto 
5 duplicate microtiter plate wells. Recognition units are 
listed along the left side of the figure; GST— SH3 domain 
fusion proteins are listed along the bottom. Recognition 
units were incubated either as multivalent complexes of 
biotinylated peptides and streptavidin-horseradish peroxidase 
10 (SA-HRP) (complexed) or as monovalent biotinylated peptides 
(uncomplexed) , followed by incubation with SA-HRP. Average 
optical densities are shown. 

Figure 9 shows a schematic of SH3-domain containing 
15 proteins isolated using the present invention. The name, 
identity , type of screen, and number of individual clones 
derived, for each sequence are indicated. Diagrams are to 
scale, with SH3 domains representing approximately 60 amino 
acids. The abbreviations AR, P, CR, E/P, and SH2 represent 
20 ankyrin repeats, proline-rich segments, Cortactin repeats, 
glutamate/proline-rich segments, and Src homology 2 domains, 
respectively. Flared ends represent putative translation 
initiation sites for individual cDNAs . The Mouse, Human 1, 
and Human 2 libraries correspond to mouse 16 day embryo, human 
25 bone marrow, and human prostate cancer cDNA libraries, 
respectively. For a description of the pSrcII and pCort 
recognition units, see Section 6.1. 

Figure 10A and 10B depicts the sequence alignment of 
30 SH3 domains in proteins isolated using the present invention. 
The name and identity of each clone is indicated. Where 
appropriate, multiple SH3 domains from the same polypeptide 
are designated A, B, C, etc., from N- to C-terminal. Periods 
indicate gaps introduced to maximize alignment of similar 
35 residues. Positions corresponding -to conserved residues shown 
to be involved in ligand binding in the SH3 domains of Src and 
Grb2/Sem5 (Tomasetto et al., 1995, Genomics 28:367-376) are 
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presented in bold and underlined, respectively. Primary 
structures of SH3P1-8 and SH3P10-13 correspond to mouse , 
SH3P15-18, clone 5, 34, 40, 41, 45, 53, 55, 56, and 65 to 
human, and SH3P9 and SH3P14 to mouse (m) or human (h) cDNA 
5 clones. For sequence comparison, the sequence of the mouse c- 
Src SH3 domain (GenBank accession number P41240) is shown. 
The GenBank accession numbers for mouse Cortactin, SPY75/HS1, 
Crk, and human MLN50, Lyn, Fyn, and Src are U03184, D42120, 
S72408, X82456, M16038, P06241, and P41240, respectively. The 
10 amino acid sequences shown are SEQ ID NOs: 112-140. 

Figure 11 depicts the specificity continuum 
described in Section 5.2.1. "SA-AP peptide complex" 
represents the multivalent (believed to be tetravalent) 
15 complex of streptavidin-alkaline phosphatase and biotinylated 
peptide described in that section. 

Figure 12 depicts the results of experiments in 
which peptide recognition units were synthesized and tested 
20 for their ability to bind to novel SH3 domains described in 
Sections 6.1 and 6.1.1. A minus indicates no binding; a plus 
indicates binding, with the number of pluses indicating the 
strength of binding. For further details, see Section 6.2. 
The amino acid sequences shown are SEQ ID NOs: 141-168. 

Figure 13 depicts more data from the experiment 
depicted in Figure 12. The amino acid sequences shown are SEQ 
ID NOs: 169-188. 

30 Figure 14 illustrates the effect of preconjugation 

with streptavidin-alkaline phosphatase on the affinity of 
biotinylated peptides for SH3 domains. See Section 6.3.1 for 
details. 

35 Figure 15 illustrates the effect of preconjugation 

with streptavidin-alkaline phosphatase on the specificity of 
biotinylated peptides for GST-SH3 domain fusion proteins that 
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have been immobilized on nylon membranes. See Section 6.3,2 
for details. 

Figure 16 illustrates the effect of preconjugation 
5 with streptavidin-alkaline phosphatase on the specificity of 
biotinylated peptides for proteins containing SH3 domains 
expressed by cDNA clones. See Section 6.3.3 for details. 

Figure 17 illustrates a strategy for exhaustively 
10 screening an expression library for SH3 domain-containing 
proteins. A peptide recognition unit is generated by 
screening a combinatorial peptide library for binders to an 
SH3 domain espressed bacterially as a GST fusion protein. 
This peptide is then used as a multivalent streptavidin- 
15 biotinylated peptide complex to screen for a subset of the SH3 
domain-containing proteins represented in a cDNA expression 
library. A combinatorial library is once again used to 
identify recognition units of SH3 domains identified in the 
first expression library screen; these recognition units 
2 0 identify overlapping sets of proteins from the expression 

library. with multiple iterations of this process, it should 
be possible to clone systematically all SH3 domains 
represented in a given cDNA expression library. 

25 Figure 18 depicts the nucleotide sequence of SH3P1 , 

mouse p53bp2 (SEQ ID NO:5). 

Figure 19 depicts the amino acid sequence of SH3P1 , 
mouse p53bp2 (SEQ ID NO: 6). 

30 

Figure 20 depicts the nucleotide sequence of SH3P2 , 
a novel mouse gene (SEQ ID NO: 7). 

Figure 21 depicts the amino acid sequence of SH3P2 , 
35 a novel mouse gene (SEQ ID NO:8). 
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Figure 22 depicts the nucleotide sequence of SH3P3, 
a novel mouse gene (SEQ ID NO: 9). 

Figure 23 depicts the amino acid sequence of SH3P3, 
5 a novel mouse gene (SEQ ID NO: 10). 

Figure 24 depicts the nucleotide sequence of SH3P4, 
a novel mouse gene (SEQ ID NO: 11). 

10 Figure 25 depicts the amino acid sequence of SH3P4 / 

a novel mouse gene (SEQ ID NO: 12). 

Figure 26 depicts the nucleotide sequence of SH3P5, 
mouse Cortactin (SEQ ID NO: 13). 

15 

Figure 27 depicts the amino acid sequence of SH3P5 # 
mouse Cortactin (SEQ ID NO: 14). 

Figure 28 depicts the nucleotide sequence of SH3P6, 
20 mouse MLN50 (SEQ ID NO: 15). 

Figure 29 depicts the amino acid sequence of SH3P6, 
mouse MLN50 (SEQ ID NO: 16). 

25 Figure 30 depicts the nucleotide sequence of SH3P7 / 

a novel mouse gene (SEQ ID NO: 17). 

Figure 31 depicts the amino acid sequence of SH3P7, 
a novel mouse gene (SEQ ID NO: 18). 

30 

Figure 32 depicts the nucleotide sequence of SH3P8, 
a novel mouse gene (SEQ ID NO: 19). 

Figure 33 depicts the amino acid sequence of SH3P8, 
35 a novel mouse gene (SEQ ID NO:20). * 
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a novel 



Figure 34 depicts the nucleotide sequence of SH3P9, 
mouse gene (SEQ ID NO: 21). 



Figure 3 5 depicts the amino acid sequence of SH3P9, 
5 a novel mouse gene (SEQ ID NO: 22). 

Figure 3 6 depicts the nucleotide sequence of SH3P9, 
a novel human gene (SEQ ID NO: 23). 

10 Figure 37 depicts the amino acid sequence of SH3P9, 

a novel human gene (SEQ ID NO: 24). 

Figure 38 depicts the nucleotide sequence of SH3P10, 
mouse HS1 (SEQ ID NO:25) . 

15 

Figure 39 depicts the amino acid sequence of SH3P10, 
mouse HS1 (SEQ ID NO:26). 

Figure 40 depicts the nucleotide sequence of SH3P11, 
20 mouse Crk (SEQ ID NO:27). 

Figure 41 depicts the amino acid sequence of SH3P11, 
mouse Crk (SEQ ID NO:28) . 

25 Figure 42A depicts the nucleotide sequence from 

positions 1-2600 of SH3P12, a novel mouse gene (a portion of 
SEQ ID NO: 29) . 

Figure 42B depicts the nucleotide sequence from 
30 positions 2601-3335 of SH3P12, a novel mouse gene (a portion 
of SEQ ID NO: 29) . 

Figure 43 depicts the amino acid sequence of SH3P12, 
a novel mouse gene (SEQ ID NO:30) . 

35 

Figure 44 depicts the nucleotide sequence of SH3P13, 
a novel mouse gene (SEQ ID NO: 31). 
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Figure 45 depicts the amino acid sequence of SH3P13, 
a novel mouse gene (SEQ ID NO: 32). 

Figure 46A depicts the nucleotide sequence from 
5 positions 1-2400 of SH3P14, mouse H74 (a portion of SEQ ID 
NO:33) • 

Figure 46B depicts the nucleotide sequence from 
positions 2351-4091 of SH3P14 / mouse H74 (a portion of SEQ ID 
10 NO: 33) . 

Figure 47 depicts the amino acid sequence of SH3P14, 
mouse H74 (SEQ ID NO:34). 

15 Figure 48 depicts the nucleotide sequence of SH3P14, 

human H74 (SEQ ID NO:35). 

Figure 49 depicts the amino acid sequence of SH3P14, 
human H74 (SEQ ID NO:36). 

20 

Figure 50 depicts the nucleotide sequence of SH3P17 / 
a novel human gene (SEQ ID NO: 37). 

Figure 51 depicts the amino acid sequence of SH3P17, 
25 a novel human gene (SEQ ID NO: 38)* 

Figure 52A depicts the nucleotide sequence of 
SH3P18, a novel human gene (SEQ ID NO:39) . 

30 Figure 53 depicts the amino acid sequence of SH3P18, 

a novel human gene (SEQ ID NO: 40). 

Figure 54 depicts the nucleotide sequence of clone 
55, a novel human gene (SEQ ID NO:189). 

35 

Figure 55 depicts the amino acid sequence of clone 
55, a novel human gene (SEQ ID NO: 190). 
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Figure 56 depicts the nucleotide sequence of clone 
56, a novel human gene (SEQ ID NO: 191). 

Figure 57 depicts the amino acid sequence of clone 
5 56, a novel human gene (SEQ ID NO:192). 

Figure 58A depicts the nucleotide sequence from 
position 1-1720 of clone 65, a novel human gene (a portion of 
SEQ ID NO: 193) . 

10 

Figure 58B depicts the nucleotide sequence from 
position 1721-2873 of clone 65, a novel human gene (a portion 
Of SEQ ID NO: 193) . 

15 Figure 59 depicts the amino acid sequence of clone 

65, a novel human gene (SEQ ID NO: 194). 

Figure 60 depicts the nucleotide sequence of clone 
34, a novel human gene (SEQ ID NO: 195). 

20 

Figure 61A depicts a portion of the amino acid 
sequence of clone 34, a novel human gene (a portion of SEQ ID 
NO: 196). 

25 Figure 61B depicts a portion of the amino acid 

sequence of clone 34, a novel human gene (a portion of SEQ ID 
NO: 196) . 

Figure 62 depicts the nucleotide sequence of clone 
30 41, a novel human gene (SEQ ID NO:197). 

Figure 63A depicts a portion of the amino acid 
sequence of clone 41, a novel human gene (a portion of SEQ ID 
NO: 198) . 

35 
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Figure 63B depicts a portion of the amino acid 
sequence of clone 41, a novel human gene (a portion of SEQ ID 
NO: 198) . 

5 Figure 64A depicts the nucleotide sequence of clone 

53, a novel human gene (SEQ ID NO: 199). 

Figure 65A depicts a portion of the amino acid 
sequence of clone 53, a novel human gene (a portion of SEQ ID 
10 NO: 200). 

Figure 65B depicts a portion of the amino acid 
sequence of clone 53, a novel human gene (a portion of SEQ ID 
NO:200). 

15 

Figure 66A and 66B depicts the nucleotide sequence 
(SEQ ID NO: 220) and amino acid sequence (SEQ ID NO: 221) of 
clone 5, a novel human gene. 

20 5. DETAILED DESCRIPTION OF THE INVENTION 

As stated above, the present invention is related 
broadly to certain polypeptides having a functional domain of 
interest and is directed to methods of identifying and using 
these polypeptides. The present invention is also directed to 

25 a method of using isolated, operationally defined ligands 

involved in binding interactions for optimally identifying an 
exhaustive set of compounds binding such ligands and to 
compounds, target molecules, and, in one embodiment, 
polypeptides having a functional domain of interest and to 

30 methods of using these compounds. The detailed description 
that follows is provided to elucidate the invention further 
and to assist further those of ordinary skill who may be 
interested in practicing particular aspects of the invention. 
First, certain definitions are in order. 

35 Accordingly, the term "polypeptide* 1 refers to a molecule 
comprised of amino acid residues joined by peptide (i.e., 
amide) bonds and includes proteins and peptides. Hence, the 
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polypeptides of the present invention may have single or 
multiple chains of covalently linked amino acids and may 
further contain intrachain or interchain linkages comprised of 
disulfide bonds* Some polypeptides may also form a subunit of 
5 a multiunit macromolecular complex. Naturally , the 
polypeptides can be expected to possess conf ormational 
preferences and to exhibit a three-dimensional structure. 
Both the conformational preferences and the three-dimensional 
structure will usually be defined by the polypeptide^ primary 

10 (i.e., amino acid) sequence and/or the presence (or absence) 
of disulfide bonds or other covalent or non-covalent 
intrachain or interchain interactions. 

The polypeptides of the present invention can be any 
size. As can be expected, the polypeptides can exhibit a wide 

15 variety of molecular weights, some exceeding 150 to 200 
kilodaltons (kD) . Typically, the polypeptides may have a 
molecular weight ranging from about 5,000 to about 100,000 
daltons. Still others may fall in a narrower range, for 
example, about 10,000 to about 75,000 daltons, or about 20,000 

2 0 to about 50,000 daltons. 

The phrase "functional domain" refers to a region of 
a polypeptide which affords the capacity to perform a 
particular function of interest. This function may give rise 
to a biological, chemical, or physiological consequence that 

2 5 may be reversible or irreversible and which may include, but 
not be limited to, protein-protein interactions (e.g., binding 
interactions) involving the functional domain, a change in the 
conformation or a transformation into a different chemical 
state of the functional domain or of molecules acted upon by 

30 the functional domain, the transduction of an intracellular or 
intercellular signal, the regulation of gene or protein 
expression, the regulation of cell growth or death, or the 
activation or inhibition of an immune response. Furthermore, 
the functional domain of interest is defined by a particular 

35 functional domain that is present in a given target molecule. 
A discussion of the selection of a particular functional 
domain-containing target molecule is presented further below. 
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Many functional domains tend to be modular in that 
such domains may occur one or more times in a given 
polypeptide (or target molecule) or may be found in a family 
of different polypeptides. When found more than once in a 
5 given polypeptide or in different polypeptides, the modular 
functional domain may possess substantially the same 
structure, in terms of primary sequence and/or three- 
dimensional space, or may contain slight or great variations 
or modifications among the different versions of the 

10 functional domain of interest. 

What is important, however, is that these related 
functional domains retain the functional aspects of the 
functional domain of interest present in the target molecule. 
It is stressed that, indeed, it is this functional 

15 relationship among two or more possible versions of a 
functional domain of interest which may be identified, 
defined, and exploited by the methods of the present 
invention. In a preferred aspect, the function of interest is 
the ability to bind to a molecule (e.g., a peptide) of 

2 0 interest. 

The present invention provides a general strategy by 
which recognition units that bind to a functional domain- 
containing molecule can be used to screen expression libraries 
of genes (e.g., cDNA, genomic libraries) systematically for 

25 novel functional domain-containing proteins. In specific 
embodiments, the recognition units are prior isolated from a 
random peptide library, or are known peptide ligands or 
recognition units, or are recognition units that are 
identified by database searches for sequences having homology 

30 to a peptide recognition unit having the binding specificity 
of interest. Using the methods of the present invention, it 
is possible to exhaustively screen an expression library for 
proteins with a given functional domain. 

In the prior art, novel genes (and thus their 

35 encoded protein products) are most -commonly identified from 
cDNA libraries. Generally, an appropriate cDNA library is 
screened with a probe that is either an oligonucleotide or an 
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antibody. In either case, the probe must be specific enough 
for the gene that is to be identified to pick that gene out 
from a vast background of non-relevant genes in the library. 
It is this need for a specific probe that is the highest 
S hurdle that must be overcome in the prior art identification 
of novel genes. Another method of identifying genes from cDNA 
libraries is through use of the polymerase chain reaction 
(PCR) to amplify a segment of a desired gene from the library. 
PCR requires that oligonucleotides having sequence similarity 

10 to the desired gene be available. 

If the probe used in prior art methods is a nucleic 
acid, the cDNA library may be screened without the need for 
expressing any protein products that might be encoded by the 
cDNA clones. If the probe used in prior art methods is an 

15 antibody, then it is necessary to build the cDNA library into 
a suitable expression vector. For a comprehensive discussion 
of the art of identifying genes from cDNA libraries, see 
Sambrook, Fritsch, and Maniatis, "Construction and Analysis of 
cDNA Libraries," Chapter 8 in Cloning, A Laboratory Manual, 2d 

20 ed., Cold Spring Harbor Laboratory Press, 1989. See also 
Sambrook, Fritsch, and Maniatis, "Screening Expression 
Libraries with Antibodies and Oligonucleotides," Chapter 12 in 
Cloning, A Laboratory Manual, 2d ed. , Cold Spring Harbor 
Laboratory Press, 1989. 

25 As an alternative to cDNA libraries, genomic 

libraries are used. When genomic libraries are used in prior 
art methods, the probe is virtually always a nucleic acid 
probe. See Sambrook, Fritsch, and Maniatis, "Analysis and 
Cloning of Eukaryotic Genomic DNA," Chapter 9 in Cloning, A 

30 Laboratory Manual, 2d ed., Cold Spring Harbor Laboratory 
Press, 1989. 

In the prior art, nucleic acid probes used in 
screening libraries are often based upon the sequence of a 
known gene that is thought to be homologous to a gene that it 

35 is desired to isolate. The success of the procedure depends 
upon the degree of homology between the probe and the target 
gene being sufficiently high. Probes based upon the sequences 
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of known functional domains in proteins had limited value 
because, while the sequences of the functional domains were 
similar enough to allow for their recognition as shared 
domains, the similarity was not so high that the probes could 
5 be used to screen cDNA or genomic libraries for genes 
containing the functional domains. 

PCR may also be used to identify genes from genomic 
libraries. However, as in the case of using PCR to identify 
genes from cDNA libraries, this requires that oligonucleotides 

10 having sequence similarity to the desired gene be available. 

Using the screening methods provided by the present 
invention, DNA encoding proteins having a desired functional 
domain that would not be readily identified by sequence 
homology can be identified by functional binding specificity 

15 to recognition units. By virtue of an ease in specificity of 
binding, requirements conferred by the screening methods of the 
present invention, many novel, functionally homologous, 
functional domain-containing proteins can be identified. 
Although not intending to be bound by any mechanistic 

20 explanation, this ease in binding specificity is believed to 
be the result of the use of a multivalent peptide recognition 
unit used to screen the gene library, preferably of a valency 
greater than bivalent, more preferably tetravalent or greater, 
and most preferably the streptavidin-biotinylated peptide 

25 recognition unit complex. 

In one particular embodiment of the invention, 
exhaustive screening of proteins having a desired functional 
domain involves an iterative process by which recognition 
units for SH3 domains identified in the first round of 

30 screening are used to detect SH3 domain-containing proteins in 
successive expression library screens (see Figure 17) . This 
strategy enables one to search "sequence space" in what might 
be thought of as ever-widening circles with each successive 
cycle. This iterative strategy can be initiated even when 

35 only one functional domain-containing protein and recognition 
unit are available. 
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This iterative process is not limited to proteins 
containing SH3 domains. Members within a class of other 
functional domains also tend to have overlapping, or at least 
similar recognition unit preferences, are structurally stable, 
5 and often confer similar binding properties to a wide variety 
of proteins. These characteristics predict that the methods 
of the present invention will be applicable to a wide variety 
of functional domain-containing proteins in addition to their 
applicability to SH3 domain-containing proteins. 

10 

5.1. Discovery of Novel Genes and Polypeptides Containing 
Functional Domains 

The present invention provides methods for the 

identification of one or more polypeptides (in particular, a 

15 "family" of polypeptides, including the target molecule) that 
contains a functional domain of interest that either 
corresponds to or is the functional equivalent of a functional 
domain of interest present in a predetermined target molecule • 
The present invention provides a mechanism for the 

20 ra P id identification of genes (e.g., cDNAs) encoding virtually 
any functional domain of interest. By screening cDNA 
libraries or other sources of polypeptides for recognition 
unit binding rather than sequence similarity, the present 
invention circumvents the limitations of conventional DNA- 

25 based screening methods and allows for the identification of 
highly disparate protein sequences possessing equivalent 
functional activities. The ability to isolate entire 
repertoires of proteins containing particular modular 
functional domains will prove invaluable both in molecular 

30 bi °logical investigations of the genome and in bringing new 
targets into drug discovery programs. 

It should likewise be apparent that a wide range of 
polypeptides having a functional domain of interest can be 
identified by the process of the invention, which process 

35 comprises: 

(a) contacting a multivalent recognition unit 
complex with a plurality of polypeptides; and 
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(b) identifying a polypeptide having a selective 
binding affinity for said recognition unit complex. 

In a specific embodiment, the process comprises: 
(a) contacting a multivalent recognition unit 
5 complex with a plurality of polypeptides from which it is 
desired to identify a polypeptide having selective binding 
affinity for the recognition unit, in which the valency of the 
recognition unit in the complex is at least two, or at least 
four; and 

10 (b) identifying, and preferably recovering, a 

polypeptide having a selective binding affinity for the 

recognition unit complex. 

In another specific embodiment, the process 

comprises a method of identifying at least one polypeptide 
15 comprising a functional domain of interest, said method 

comprising: 

(a) contacting one or more multivalent recognition 
unit complexes with a plurality of polypeptides; and 

(b) identifying at least one polypeptide having 
2 0 selective binding affinity for at least one of said 

recognition unit complexes. 

In another specific embodiment, the process 

comprises: 

(a) contacting a multivalent recognition unit 
2 5 complex, which complex comprises (i) avidin or streptavidin, 
and (ii) biotinylated recognition units, with a plurality of 
polypeptides from a cDNA expression library, in which the 
recognition unit is a peptide having in the range of 6 to 60 
amino acid residues; and 
30 (b) identifying a polypeptide having a selective 

binding affinity for said recognition unit complex. 

In another specific embodiment, the process 
comprises a method of identifying a polypeptide having an SH3 
domain of interest comprising: 
35 (a) contacting a multivalent recognition unit 

complex, which complex comprises (i) avidin or streptavidin, 
and (ii) biotinylated recognition units, with a plurality of 
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polypeptides from a cDNA expression library, in which the 
recognition unit is a peptide having in the range of 6 to 60 
amino acid residues and which selectively binds an SH3 domain; 
and 

5 (b) identifying a polypeptide having a selective 

binding affinity for said recognition unit complex • 

In another specific embodiment, the process 
comprises a method of identifying a polypeptide having a 
functional domain of interest or a functional equivalent 
10 thereof comprising: 

(a) screening a random peptide library to identify a 
peptide that selectively binds a functional domain of 
interest ; and 

(b) screening a cDNA or genomic expression library 
15 with said peptide or a binding portion thereof to identify a 

polypeptide that selectively binds said peptide. 

In a specific embodiment of the above method, the 
screening step (b) is carried out by use of said peptide in 
the form of multiple antigen peptides (MAP) or by use of said 
20 peptide cross-linked to bovine serum albumin or keyhole limpet 
hemocyanin. 

In another specific embodiment, the process 
comprises a method of identifying a polypeptide having a 
functional domain of interest or a functional equivalent 
25 thereof comprising: 

(a) screening a random peptide library to identify a 
plurality of peptides that selectively bind a functional 
domain of interest; 

(b) determining at least part of the amino acid 
30 sequences of said peptides; 

(c) determining a consensus sequence based upon the 
determined amino acid sequences of said peptides; and 

(d) screening a cDNA or genomic expression library 
with a peptide comprising the consensus sequence to identify a 

35 polypeptide that selectively binds* said peptide* 

In another specific embodiment, the process 
comprises a method of identifying a polypeptide having a 
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functional domain of interest or a functional equivalent 
thereof comprising: 

(a) screening a random peptide library to identify a 
first peptide that selectively binds a functional domain of 

5 interest ; 

(b) determining at least part of the amino acid 
sequence of said first peptide; 

(c) searching a database containing the amino acid 
sequences of a plurality of expressed natural proteins to 

10 identify a protein containing an amino acid sequence 

homologous to the amino acid sequence of said first peptide; 
and 

(d) screening a cDNA or genomic expression library 
with a second peptide comprising the sequence of said protein 

15 that is homologous to the amino acid sequence of said first 
peptide. 

The identified polypeptide identified by the above- 
described methods thus should contain the functional domain of 
interest or a functional equivalent thereof (that is, having a 

2 0 functional domain that is identical, or having a functional 
domain that differs in sequence but is capable of binding to 
the same recognition unit) . In a particular embodiment, the 
polypeptide identified is a novel polypeptide. In a preferred 
embodiment; the recognition unit that is used to form the 

25 multvalent recognition unit complex is isolated or identified 
from a random peptide library. 

In a specific embodiment, the present invention 
provides amino acid sequences and DNA sequences encoding novel 
proteins containing SH3 domains. The SH3 domains vary in 

30 sequence but retain binding specificity to an SH3 domain 

recognition unit. Also provided are fragments and derivatives 
of the novel proteins containing SH3 domains as well as DNA 
sequences encoding the same. It will be apparent to one of 
ordinary skill in the art that also provided are proteins that 

35 vary slightly in sequence from the -novel proteins by virtue of 
conservative amino acid substitutions. It will also be 
apparent to one of ordinary skill in the art that the novel 
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proteins may be expressed recombinantly by standard methods. 
The novel proteins may also be expressed as fusion proteins 
with a variety of other proteins, e.g., glutathione S- 
transf erase . 

5 The present invention provides a purified 

polypeptide comprising an SH3 domain, said SH3 domain having 
an amino acid sequence selected from the group consisting of: 
SEQ ID NOs: 113-115, 118-121, 125-128, 133-139, 204-218, and 
219. Also provided is a purified DNA encoding the 
10 polypeptide. 

Also provided is a purified polypeptide comprising 
an SH3 domain, said polypeptide having an amino acid sequence 
selected from the group consisting of SEQ ID NOs: 8, 10, 12, 
18, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, 
15 and 221. Also provided is a purified DNA encoding the 
polypeptide. 

Also provided is a purified DNA encoding an SH3 
domain, said DNA having a sequence selected from the group 
consisting of SEQ ID NOs: 7, 9, 11, 17, 19, 21, 23, 29, 31, 
20 37, 39, 189, 191, 193, 195, 197, 199, and 220. Also provided 
is a nucleic acid vector comprising this purified DNA. Also 
provided is a recombinant cell containing this nucleic acid 
vector. 

Also provided is a purified DNA encoding a 
25 polypeptide having an amino acid sequence selected from the 
group consisting of: SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 
30, 32, 38, 40, 190, 192, 194, 196, 198, 200, and 221. Also 
provided is a nucleic acid vector comprising this purified 
DNA. Also provided is a recombinant cell containing this 
30 nucleic acid vector. 

Also provided is a purified DNA encoding a 
polypeptide comprising an amino acid sequence selected from 
the group consisting of: SEQ ID NOs: 113-115, 118-121, 125-128, 
133-139, 204-218, and 219. Also provided is a nucleic acid 
35 vector comprising this purified DNA. Also provided is a 
recombinant cell containing this nucleic acid vector. 



- 28 - 



EV245495899US ] 



Also provided is a purified molecule comprising an 
SH3 domain of a polypeptide having an amino acid sequence 
selected from the group consisting of: SEQ ID NO: 8, 10, 12, 
IB, 20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, 
5 and 221.. 

Also provided is a fusion protein comprising (a) an 
amino acid sequence comprising an SH3 domain of a polypeptide 
having the amino acid sequence of SEQ ID NO: 8, 10, 12, 18, 
20, 22, 24, 30, 32, 38, 40, 190, 192, 194, 196, 198, 200, and 

10 221 joined via a peptide bond to (b) an amino acid sequence of 
at least six, or ten, or twenty amino acids from a different 
polypeptide. Also provided is a purified DNA encoding the 
fusion protein. Also provided is a nucleic acid vector 
comprising the purified DNA encoding the fusion protein. Also 

15 provided is a recombinant ceil containing this nucleic acid 
vector. Also provided is a method of producing this fusion 
protein comprising culturing a recombinant cell containing a 
nucleic acid vector encoding said fusion protein such that 
said fusion protein is expressed, and recovering the expressed 

2 0 fusion protein. 

The present invention also provides a purified 
nucleic acid hybridizable to a nucleic acid having a sequence 
selected from the group consisting of: SEQ ID NOs: 7, 9, 11, 
17, 19, 21, 23, 29, 31, 37, 39, 189, 191, 193, 195, 197, 199, 

25 and 220. 

The . present invention also provides antibodies to a 
polypeptide having an amino acid sequence selected from the 
group consisting of: SEQ ID NOs: 113-115, 118-121, 125-128, 
133-139, 204-218, and 219. 

30 The present invention also provides antibodies to a 

polypeptide having an amino acid sequence selected from the 
group consisting of SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 30, 
32, 38, 40, 190, 192, 194, 196, 198, 200, and 221. 

It is demonstrated by way of example herein that 

35 recognition units that comprise SH3 domain ligands derived 
from combinatorial peptide libraries may be used in the 
methods of the present invention as probes for the rapid 
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discovery of novel proteins containing SH3 functional domains. 
The methods of the present invention require no prior 
knowledge of the characteristics of a SH3 domain's natural 
cellular ligand to initiate the process of discovery. One 
5 needs only enough purified SH3 domain-containing protein (by 
way of example, 1-5/ig) to select peptides from a random 
peptide library. In addition, because the methods of the 
present invention identify novel proteins from cDNA expression 
libraries based only on their binding properties, low primary 

10 sequence identity between the target SH3 domain and the SH3 
domains of the novel proteins discovered need not be a 
limitation, provided some functional similarity between these 
SH3 domains is conserved. Also, the methods of the present 
invention are rapid, require inexpensive reagents, and employ 

15 simple and well established laboratory techniques. 

Using these methods, more than eighteen different 
SH3 domain-containing proteins have been identified, over half 
of which have not been previously described. While certain of 
these previously unknown proteins are clearly related to known 

20 genes such as amphiphysin and drebrin, others constitute new 
classes of signal transduction and/or cytoskeletal proteins. 
These include SH3P17 and SH3P18, two members of a new family 
of adaptor-like proteins comprised of multiple SH3 domains; 
SH3P12, a novel protein with three SH3 domains and a region 

25 similar to the extracellular peptide hormone sorbin; and 

SH3P4, SH3P8, and SH3P13, three members of a third new family 
of SH3 -containing proteins. These novel proteins are 
described more fully in Sections 6.1 and 6.1.1.. The high 
incidence of novel proteins identified by the methods of the 

30 present invention indicates that a large number of SH3 domain- 
containing proteins remain to be discovered by application of 
the methods of the invention. 

One of ordinary skill in the art would recognize 
that the above-described novel proteins need not be used in 

35 their entirety in the various applications of those proteins 
described herein. In many cases it will be sufficient to 
employ that portion of the novel protein that contains the 
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functional {e.g., SH3) domain. Such exemplary portions of SH3 
domain-containing proteins are shown in Figure 10 A and 10B. 
Accordingly , the present invention provides derivatives (e.g., 
fragments and molecules comprising these fragments) of novel 
5 proteins that contain SH3 domains, e.g., as shown in Figure 
10A and 10B. Nucleic acids encoding these fragments or other 
derivatives are also provided. 

In another embodiment, the present invention 
includes a method of identifying one or moire novel 
10 polypeptides having an SH3 domain, said method comprising: 

(a) identifying a recognition unit having a 
selective affinity for the SH3 domain by screening a peptide 
library with the SH3 domain; 

(b) producing said recognition unit; 

15 (c) contacting said recognition unit with a source 

of polypeptides; and 

(d) identifying one or more novel polypeptides 
having a selective affinity for said recognition unit, which 
polypeptides comprise the SH3 domain. 

20 

5.1.1 Functional Domains 

Functional domains of interest in the practice of 
the present invention can take many forms and may perform a 
variety of functions. For example, such functional domains 

25 may be involved in a number of cellular, biochemical, or 

physiological processes, such as cellular signal transduction, 
transcriptional regulation, translational regulation, cell 
adhesion, migration or transport, cytokine secretion and other 
aspects of the immune response, and the like. In particular 

30 embodiments of the present invention, the functional domains 
. of interest may consist of regions known as SHI, SH2, SH3, PH, 
PTB, LIM, armadillo, and Notch/ankyrin repeat. See, e.g., 
Pawson, 1995, Nature 373:573-580; Cohen et al. , 1995, Cell 
80:237-248. Functional domains may also be chosen from among 

35 regions known as zinc fingers, leucine zippers, and helix- 
turn-helix or helix-loop-helix. Certain functional domains 
may be binding domains, such as DNA-binding domains or actin- 
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binding domains. Still other functional domains may serve as 
sites of catalytic activity. 

In one embodiment of the invention, a suitable 
target molecule containing the chosen functional domain of 
5 interest is selected. In the case of an SH3 domain, for 

example, a number of proteins (or functional domain-containing 
derivatives or analogs thereof) may be selected as the target 
molecule, including but not limited to, the Src family of 
proteins: Fyn, Lck, Lyn, Src, or Yes. still other proteins 

10 contain an SH3 domain and can be used, including, but not 
limited to: Abl, Crk, Nek (other oncogenes), Grb2, PLCy, 
RasGAP (proteins involved in signal transduction) , ABP-l, 
myosin-1, spectrin (proteins found in the cytoskeleton) , and 
neutrophil NADPH oxidase (an enzyme) . In the case of a 

15 catalytic site, any catalytically active protein, such as an 
enzyme, can be used, particularly one whose catalytic site is 
known. For example, the catalytic site of the protein 
glutathione S-transf erase (GST) can be used. Other target 
molecules that possess catalytic activity may include, but are 

20 not limited to, protein serine/ threonine kinases, protein 
tyrosine kinases, serine proteases, DNA or RNA polymerases, 
phospholipases, GTPases, ATPases, Pl-kinases, DNA methylases, 
metabolic enzymes, or protein glycosylases. 

25 5.1.2. Recognition Units 

By the phrase "recognition unit," is meant any 
molecule having a selective affinity for the functional domain 
of the target molecule and, preferably, having a molecular 
weight of up to about 20,000 daltons. In a particular 

3 0 embodiment of the invention, the recognition unit has a 

molecular weight that ranges from about 100 to about 10,000 
daltons. 

Accordingly, preferred recognition units of the 
present invention possess a molecular weight of about 100 to 
35 about 5,000 daltons, preferably from about 100 to about 2,000 
daltons, and most preferably from about 500 to about 1,500 
daltons. As described further below, the recognition unit of 
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the present invention can be a peptide, a carbohydrate , a 
nucleoside, an oligonucleotide, any small synthetic molecule, 
or a natural product. When the recognition unit is a peptide, 
the peptide preferably contains about 6 to about 60 amino acid 
5 residues. 

When the recognition unit is a peptide, the peptide 

can have less than about 140 amino acid residues; preferably, 

the peptide has less than about 100 amino acid residues; 

preferably, the peptide has less than about 70 amino acid 
10 residues; preferably, the peptide has 20 to 50 amino acid 

residues; most preferably, the peptide has about 6 to 60 amino 

acid residues. 

The peptide recognition units are preferably in the 

form of a multivalent peptide complex comprising avidin or 
15 streptavidin (optionally conjugated to a label such as 

alkaline phosphatase or horseradish peroxidase) and 

biotinylated peptides. 

According to the present invention, a recognition 

unit (preferably in the form of a multvalent recognition unit 
20 complex) is used to screen a plurality of expression products 

of gene sequences containing nucleic acid sequences that are 

present in native RNA or DNA (e.gr., cDNA library , genomic 

library) . 

The step of choosing a recognition unit can be 
25 accomplished in a number of ways that are known to those of 
ordinary skill, including but not limited to screening cDNA 
libraries or random peptide libraries for a peptide that binds 
to the functional domain of interest. See, e.g., Yu et al., 
1994, Cell 76, 933-945; Sparks et al., 1994, J. Biol. Chem. 
30 269, 23853-23856. Alternatively, a peptide or other small 
molecule or drug may be known to those of ordinary skill to 
bind to a certain target molecule and can be used. The 
recognition unit can even be synthesized from a lead compound, 
which again may be a peptide, carbohydrate, oligonucleotide, 
35 small drug molecule, or the like. 'The recognition unit can 
also be identified for use by doing searches (preferably via 
database) for molecules having homology for other, known 
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recognition unit(s) having the ability to selectively bind to 
the functional domain of interest. 

In a specific embodiment, the step of selecting a 
recognition unit for use can be effected by, e.g., the use of 
5 diversity libraries, such as random or combinatorial peptide 
or nonpeptide libraries, which can be screened for molecules 
that specifically bind to the functional domain of interest, 
e.g., an SH3 domain. Many libraries are known in the art that 
can be used, e.g., chemically synthesized libraries, 

10 recombinant (e.g., phage display libraries), and in vitro 
translation-based libraries. 

Examples of chemically synthesized libraries are 
described in Fodor et al., 1991, Science 251:767-773; Houghten 
et al., 1991, Nature 354:84-86; Lam et al. , 1991, 

15 Nature 354:82-84; Medynski, 1994, Bio/Technology 12:709-710; 
Gallop et al., 1994, J. Medicinal Chemistry 37 (9) : 1233-1251; 
Ohlmeyer et al., 1993, Proc. Natl. Acad. Sci. USA 
90:10922-10926; Erb et al., 1994, Proc. Natl. Acad. Sci. USA 
91:11422-11426; Houghten et al., 1992, Biotechniques 13:412; 

20 Jayawickreme et al., 1994, Proc. Natl. Acad. Sci. USA 

91:1614-1618; Salmon et al., 1993, Proc. Natl. Acad. Sci. USA 
90:11708-11712; PCT Publication No. WO 93/20242; and Brenner 
and Lerner, 1992, Proc. Natl. Acad. Sci. USA 89:5381-5383. 

Examples of phage display libraries are described in 

25 Scott and Smith, 1990, Science 249:386-390; Devlin et al. , 

1990, Science, 249:404-406; Christian, R.B., et al., 1992, J. 
Mol. Biol. 227:711-718); Lenstra, 1992, J. Immunol. Meth. . 
152:149-157; Kay et al. , 1993, Gene 128:59-65; and PCT 
Publication No. WO 94/18318 dated August 18, 1994. 

30 In vitro translation-based libraries include but are 

not limited to those described in PCT Publication No. 
WO 91/05058 dated April 18, 1991; and Mattheakis et al., 1994, 
Proc. Natl. Acad. Sci. USA 91:9022-9026. 

By way of examples of nonpeptide libraries, a 

35 benzodiazepine library (see e.g., Bunin et al., 1994, Proc. 
Natl. Acad. Sci. USA 91:4708-4712) can be adapted for use. 
Peptoid libraries (Simon et al. , 1992, Proc. Natl. Acad. Sci. 
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USA 89:93 67-9371) can also be used. Another example of a 
library that can be used, in which the amide functionalities 
in peptides have been permethylated to generate a chemically 
transformed combinatorial library, is described by Ostresh et 
5 al. (1994, Proc. Natl. Acad. Sci. USA 91:11138-11142). 

The variety of non-peptide libraries that are useful 
in the present invention is great. For example, Ecker and 
Crooke, 1995, Bio/Technology 13:351-360 list benzodiazapines, 
hydantoins, piperazinediones, biphenyls, sugar analogs, (3- 
10 mercaptoketones , arylacetic acids, acylpiperidines, 

benzopyrans, cubanes, xanthines, aminimides, and oxazolones as 
among the chemical species that form the basis of various 
libraries. 

Non-peptide libraries can be classified broadly into 

15 two types: decorated monomers and oligomers. Decorated 

monomer libraies employ a relatively simple scaffold structure 
upon which a variety of functional groups is added. Often the 
scaffold will be a molecule with a known useful 
pharmacological activity. For example, the scaffold might be 

20 the benzodiazapine structure. 

Non-peptide oligomer libraries utilize a large 
number of monomers that are assembled together in a ways that 
create new shapes that depend on the order of the monomers. 
Among the monomer units that have been used are carbamates, 

25 pyrrolinones, and morpholinos. Peptoids, peptide-like 

oligomers in which the side chain is attached to the a amino 
group rather than the a carbon, form the basis of another 
version of non-peptide oligomer libraries. The first non- 
peptide oligomer libraries utilized a single type of monomer* 

30 and thus contained a repeating backbone. Recent libraries 
have utilized more than one monomer, giving the libraries 
added flexibility. 

Screening the libraries can be accomplished by any 
of a variety of commonly known methods. See, e.gr., the 

35 following references, which disclose screening of peptide 
libraries: Parmley and Smith, 1989, Adv. Exp. Med. Biol. 
251:215-218; Scott and Smith, 1990, Science 249:386-390; 
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Fowlkes et al., 1992; BioTechniques 13:422-427; Oldenburg et 
al., 1992, Proc. Natl. Acad. Sci. USA 89:5393-5397; Yu et al. , 
1994, Cell 76:933-945; Staudt et al., 1988, Science 241:577- 
580; Bock et al., 1992, Nature 355:564-566; Tuerk et al., 
5 1992, Proc. Natl. Acad. Sci. USA 89:6988-6992; Ellington et 
al., 1992, Nature 355:850-852; U.S. Patent No. 5,096,815, 
U.S. Patent No. 5,223,409, and U.S. Patent No. 5,198,346, all 
to Ladner et al.; Rebar and Pabo, 1993, Science 263:671-673; 
and PCT Publication No. WO 94/18318. 

10 In a specific embodiment, screening to identify a 

recognition unit can be carried out by contacting the library 
members with an SH3 domain immobilized on a solid phase and 
harvesting those library members that bind to the SH3 domain. 
Examples of such screening methods, termed "panning" 

15 techniques are described by way of example in Parmley and 
Smith, 1988, Gene 73:305-318; Fowlkes et al., 1992, 
BioTechniques 13:422-427; PCT Publication No. WO 94/18318; and 
in references cited hereinabove. 

In another embodiment, the two-hybrid system for 

20 selecting interacting proteins in yeast (Fields and Song, 
1989, Nature 340:245-246; Chien et al. , 1991, Proc. Natl. 
Acad. Sci. USA 88:9578-9582) can be used to identify 
recognition units that specif ically bind to SH3 domains. 

Where the recognition unit is a peptide, the peptide 

25 can be conveniently selected from any peptide library, 

including random peptide libraries, combinatorial peptide 
libraries, or biased peptide libraries. The term "biased" is 
used herein to mean that the method of generating the library 
is manipulated so as to restrict one or more parameters that 

30 govern the diversity of the resulting collection of molecules, 
in this case peptides. 

Thus, a truly random peptide library would generate 
a collection of peptides in which the probability of finding a 
particular amino acid at a given position of the peptide is 

35 the same for all 20 amino acids. A bias can be introduced 

into the library, however, by specifying, for example, that a 
lysine occur every fifth amino acid or that positions 4, 8, 
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and 9^ of a decapeptide library be fixed to include only 
arginine. Clearly, many types of biases can be contemplated , 
and the present invention is not restricted to any particular 
bias. Furthermore , the present invention contemplates 
5 specific types of peptide libraries, such as phage-displayed 
peptide libraries and those that utilize a DNA construct 
comprising a lambda phage vector with a DNA insert. 

As mentioned above, in the case of a recognition 
unit that is a peptide, the peptide may have about 6 to less 

10 than about 60 amino acid residues, preferably about 6 to about 
25 amino acid residues, and most preferably, about 6 to about 
15 amino acids. In another embodiment, a peptide recognition 
unit has in the range of 20-100 amino acids, or 20-50 amino 
acids. In the case of a bile acid receptor, for example, the 

15 recognition unit may be a bile acid, such as cholic acid or 
cholesterol, and may have a molecular weight of about 300 to 
about 600. If the functional domain relates to 
transcriptional control, the recognition unit may be a portion 
of a transcriptional factor, which may bind to a region of a 

20 gene of interest or to an RNA polymerase. The recognition 
unit may even be a nucleoside analog, such as cordycepin or 
the triphosphate thereof, capable of inhibiting RNA 
biosynthesis. The recognition unit may also be the 
carbohydrate portion of a glycoprotein, which may have a 

25 selective affinity for the asialoglycoprotein receptor, or the 
repeating glucan unit that exhibits a selective affinity for a 
cellulose binding domain or the active site of heparinase. 

The selected recognition unit can be obtained by 
chemical synthesis or recombinant expression. It is 

30 preferably purified prior to use in screening a plurality of 
gene sequences. 

5.1.3. Screening a Source of Polypeptides 

After the recognition unit is chosen for use, the 
35 recognition unit is then contacted • with a plurality of 

polypeptides, preferably containing a functional domain. In a 
particular embodiment of the invention, the plurality of 
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polypeptides is obtained from a polypeptide expression 
library. The polypeptide expression library may be obtained , 
in turn, from cDNA, fragmented genomic DNA, and the like. in 
a specific embodiment, the library that is screened is a cDNA 
5 library of total poly A+ RNA of an organism, in general, or of 
a particular cell or tissue type or developmental stage or 
disease condition or stage. The expression library may 
utilize a number of expression vehicles known to those of 
ordinary skill, including but not limited to, recombinant 

10 bacteriophage, lambda phage, M13, a recombinant plasmid or 
cosmid, and the like. 

The plurality of polypeptides or the DNA sequences 
encoding same may be obtained from a variety of natural or 
unnatural sources, such as a procaryotic or a eucaryotic cell, 

15 either a wild type, recombinant, or mutant. In particular, 
the plurality of polypeptides may be endogenous to 
microorganisms, such as bacteria, yeast, or fungi, to a virus, 
to an animal (including mammals, invertebrates, reptiles, 
birds, and insects) or to a plant cell. 

20 In addition, the plurality of polypeptides may be 

obtained from more specific sources, such as the surface coat 
of a virion particle, a particular cell lysate, a tissue 
extract, or they may be restricted to those polypeptides that 
are expressed on the surface of a cell membrane. 

25 Moreover, the plurality of polypeptides may be 

obtained from a biological fluid, particularly from humans, 
including but not limited to blood, plasma, serum, urine, 
feces, mucus, semen, vaginal fluid, amniotic fluid, or 
cerebrospinal fluid. The plurality of polypeptides may even 

30 be obtained from a fermentation broth or a conditioned medium, 
including all the polypeptide products secreted or produced by 
the cells previously in the broth or medium. 

The step of contacting the recognition unit with the 
plurality of polypeptides may be effected in a number of ways. 

35 For example, one may contemplate immobilizing the recognition 
unit on a solid support and bringing a solution of the 
plurality of polypeptides in contact with the immobilized 
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recognition unit. Such a procedure would be akin to an 
affinity chromatographic process, with the affinity matrix 
being comprised of the immobilized recognition unit* The 
polypeptides having a selective affinity for the recognition 
5 unit can then be purified by affinity selection. The nature 
of the solid support , process for attachment of the 
recognition unit to the solid support, solvent, and conditions 
of the affinity isolation or selection procedure would depend 
on the type of recognition unit in use but would be largely 

10 conventional and well known to those of ordinary skill in the 
art. Moreover, the valency of the recognition unit in the 
recognition unit complex used to screen the polypeptides is 
believed to affect the specificity of the screening step, and 
thus the valency can be chosen as appropriate in view of the 

15 desired specificity (see Sections 5.2 and 5.2.1). 

Alternatively, one may also separate the plurality 
of polypeptides into substantially separate fractions 
comprising individual polypeptides. For instance, one can 
separate the plurality of polypeptides by gel electrophoresis, 

2 0 column chromatography, or like method known to those of 
ordinary skill for the separation of polypeptides. The 
individual polypeptides can also be produced by a transformed 
host cell in such a way as to be expressed on or about its 
outer surface. Individual isolates can then be "probed" by 

25 the recognition unit, optionally in the presence of an inducer 
should one be required for expression, to determine if any 
selective affinity interaction takes place between the 
recognition unit and the individual clone. Prior to 
contacting the recognition unit with each fraction comprising 

30 individual polypeptides, the polypeptides can optionally first 
be transferred to a solid support for additional convenience. 
Such a solid support may simply be a piece of filter membrane, 
such as one made of nitrocellulose or nylon. 

In this manner, positive clones can be identified 

35 from a collection of transformed hdst cells of an expression 
library, which harbor a DNA construct encoding a polypeptide 
having a selective affinity for the recognition unit. The 
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polypeptide produced by the positive clone includes the 
functional domain of interest or a functional equivalent 
thereof. Furthermore, the amino acid sequence of the 
polypeptide having a selective affinity for the recognition 
5 unit can be determined directly by conventional means of amino 
acid sequencing , or the coding sequence of the DNA encoding 
the polypeptide can frequently be determined more conveniently 
by use of standard DNA sequencing methods. The primary 
sequence can then be deduced from the corresponding DNA 
10 sequence. 

If the amino acid sequence is to be determined from 
the polypeptide itself , one may use microsequencing 
techniques. The sequencing technique may include mass 
spectroscopy . 

*5 In certain situations, it may be desirable to wash 

away any unbound recognition unit from a mixture of the 
recognition unit and the plurality of polypeptides prior to 
attempting to determine or to detect the presence of a 
selective affinity interaction (i.e., the presence of a 

20 recognition unit that remains bound after the washing step) 
Such a wash step may be particularly desirable when the 
plurality of polypeptides is bound to a solid support. 

As can be anticipated, the degree of selective 
affinities observed varies widely, generally falling in the 

25 range of about 1 nm to about 1 mM. In preferred embodiments 
of the present invention, the selective affinity is on the 
order of about 10 nM to about 100 jxM, more preferably on the 
order of about 100 nM to about 10 /iM, and most preferably on 
the order of about 100 nM to about 1 /iM. 

30 

5.2. Specificity of Recognition Units 

A particular recognition unit may have fairly 
generic selectivity for a several members (e.g., three or four 
or more) of a "panel" of polypeptides having the domain of 
35 interest (or different versions of "the domain of interest or 
functional equivalents of the domain of interest) or a fairly 
specific selectivity for only one or two, or possibly three, 
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of the polypeptides among a "panel" of same. Furthermore, 
multiple recognition units , each exhibiting a range of 
selectivities among a "panel" of polypeptides can be used to 
identify an increasingly comprehensive set of additional 
5 polypeptides that include the functional domain of interest. 

Hence, in a population of related polypeptides, the 
functional domains of interest of each member may be 
schematically represented by a circle. See, by way of 
example, Figure 7A. The circle of one polypeptide may overlap 

10 with that of another polypeptide. Such overlaps may be few or 
numerous for each polypeptide. A particular recognition unit, 
A, may recognize or interact with a portion of the circle of a 
given polypeptide which does not overlap with any other 
circle. Such a recognition unit would be fairly specific to 

15 that polypeptide. On the other hand, a second recognition 

unit, B, may recognize a region of overlap between two or more 
polypeptides. Such a recognition unit would consequently be 
less specific than the recognition unit A and may be 
characterized as having a more generic specificity depending 

20 on the number of polypeptides that it recognizes or interacts 
with. 

It should also be apparent to those of ordinary 
skill that any number of B-type recognition units (Bj, B 2 , B 3 , 
etc.) can be present, each recognizing different "panels" of 

25 polypeptides. Hence, the use of multiple recognition units 
provides an increasingly more exhaustive population of 
polypeptides, each of which exhibits a variation or evolution 
in the functional domain of interest present in the initial 
target molecule. It should also be apparent to one that the 

3 0 present method can be applied in an iterative fashion, such 
that the identification of a particular polypeptide can lead 
to the choice of another recognition unit. See, e.g., Figure 
7B. Use of this new recognition unit will lead, in turn, to 
the identification of other polypeptides that contain 

35 functional domains of interest that enhance the phenotypic 
and/or genotypic diversity of the population of "related" 
polypeptides . 
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Hence , with a given recognition unit, one may 
observe interaction with only one or two different 
polypeptides. With other recognition units , one may find 
three, four, or more selective interactions. in the situation 
5 in which only a single interaction is observed, it is likely, 
though not mandatory, that the selective affinity interaction 
is between the recognition unit and a replica of the initial 
target molecule (or a molecule very similar structurally and 
"functionally" to the initial target molecule) . 



10 



15 



20 



25 



30 



35 



5.2.1. Effect of the Presentation of the Recognition 
Unit Complex on the Specificity of the 
Recognition Un it-Functional Domain Interaction 

The present inventors have found, unexpectedly, that 
the valency (i.e., whether it is a monomer, dimer, tetramer, 
etc.) of the recognition unit that is used to screen an 
expression library or other source of polypeptides apparently 
has a marked effect upon which genes or polypeptides are 
identified from the expression library or source of 
polypeptides. In particular, the specificity of the 
recognition unit-functional domain interaction appears to be 
affected by the valency of the recognition unit in the 
screening process. By this specificity is meant the 
selectivity in the functional domains to which the recognition 
unit will bind in the screening step. 

As discussed above, in one embodiment, recognition 
units are obtained by screening a source of recognition units, 
e.gr., a phage display library, for recognition units that bind 
to a particular target functional domain. Alternatively, 
database searches for recognition units with sequence homology 
to known recognition units can be employed. Of course, if a 
recognition unit for a particular target functional domain is 
already known, there is no need to screen a library or other 
source of recognition units; one can merely synthesize that 
particular recognition unit. The recognition unit, however 
obtained, is then used to screen an expression library or 
other source of polypeptides, to identify polypeptides that 
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the recognition unit binds to. A recognition unit that 
identifies only its target functional domain is a recognition 
unit that is completely specific. A recognition unit that 
identifies one or two other polypeptides that do not contain 
5 identically the target functional domain, from among a 

plurality of polypeptides (e.g., of greater than 10 4 , 10 6 , or 
10 8 complexity) , in addition to identifying a molecule 
comprising its target functional domain, is very or highly 
specific. A recognition unit that identifies most other 

10 polypeptides present that do not contain its target functional 
domain, in addition to identifying its target functional 
domain, is a non-specific recognition unit. In between very 
specific recognition units and non-specific recognition units, 
the present inventors have discovered that there are 

15 recognition units that recognize a small number of molecules 
having functional domains other than their target functional 
domains. These recognition units are said to have generic 
specificity. 

Thus, there is a "specificity continuum", from 

2 0 completely and very specific through generic to non-specific, 

that a recognition unit may evince. See Figure 11 for a 
depiction of this specificity continuum. The Applicants have 
discovered that a major factor influencing the specificity 
exhibited by a recognition unit appears to be the valency of 
25 the recognition unit in the complex used to screen the 
expression library. 

Usually/ high specificity is considered to be 
desirable when screening a library. High specificity is 
exhibited, e.g., by affinity purified polyclonal antisera 

3 0 which, in general, are very specific. Monoclonal antibodies 

are also very specific. Small peptides in monovalent form, on 
the other hand, generally give very weak, non-specific signals 
when used to screen a library; thus, they are considered to be 
non-specific. 

35 The present inventors have discovered that 

recognition units in the form of small peptides, in 
multivalent form, have a specificity midway between the high 

- 43 - 



EV245495899US 



specificity of antibodies and the low/non-specificity of 
monovalent peptides. Multivalency of the recognition unit of 
at least two, in a recognition unit complex used to screen the 
gene library, is preferred, with a multivalency of at least 
5 four more preferred, to obtain a screening wherein specificity 
is eased but not forfeited. In particular, a multivalent 
(believed to be tetravalent) recognition unit complex 
comprising streptavidin or avidin (preferably conjugated to a 
label, e.gr., an enzyme such as alkaline phosphatase or 

10 horseradish peroxidase, or a fluorogen, e.gr. green fluorescent 
protein) and biotinylated peptide recognition units have an 
unexpected generic specificity. This allows such peptides to 
be used to screen libraries to identify classes of 
polypeptides containing functional domains that are similar 

15 but not identical to the peptides 1 target functional domains. 
These classes of polypeptides are identified despite the low 
level of homology at the amino acid level of the functional 
domains of the members of the classes. 

In another specific embodiment, multivalent peptide 

20 recognition units may be in the form of multiple antigen 

peptides (MAP) (Tarn, 1989, J. Imm. Meth. 124:53-61; Tarn, 1988, 
Proc. Natl. Acad. Sci. USA 85:5409-5413). In this form, the 
peptide recognition unit is synthesized on a branching lysyl 
matrix using solid-phase peptide synthesis methods. 

25 Recognition units in the form of MAP may be prepared by 

methods known in the art (Tarn, 1989, J. Imm. Meth. 124:53-61; 
Tarn, 1988, Proc. Natl. Acad. Sci. USA 85:5409-5413), or, for 
example, by a stepwise solid-phase procedure on MAP resins 
(Applied Biosystems) , utilizing methodology established by the 

30 manufacturer. MAP peptides may be synthesized comprising 
(recognition unit peptide) 2 Lys lr (recognition unit 
peptide ) 4 Iiys 3 , (recognition unit peptide) gLys 6 or more levels of 
branching. 

The multivalent peptide recognition unit complexes 
35 may also be prepared by cross-linking the peptide to a carrier 
protein, e.g., bovine serum albumin (BSA) , keyhole limpet 
hemocyanin (KLH) , or an enzyme, by use of known cross-linking 
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reagents. Such cross-linked peptide recognition units may be 
detected by # e.g., an antibody to the carrier protein or 
detection of the enzymatic activity of the carrier protein. 

Furthermore, the present inventors have discovered 
5 what specificity is exhibited by various types of recognition 
units and their complexes, i.e., where these recognition units 
and their complexes fall in the specificity continuum. The 
present inventors have discovered a range of formats for 
presenting recognition units used to screen libraries. For 

10 example, the present inventors have determined that a peptide 
in the form of a bivalent fusion protein with alkaline 
phosphatase is very specific. The same peptide in the form of 
a fusion protein with the pill protein of an M13 derived 
bacteriophage, expressed on the phage surface, has somewhat 

15 less, though still high, specificity. That same peptide when 
biotinylated in the form of a tetravalent streptavidin- 
alkaline phosphatase complex has generic specificity. Use of 
such a generically specific peptide permits the identification 
of a wide range of proteins from expression libraries or other 

20 sources of polypeptides, each protein containing an example of 
a particular functional domain. 

Accordingly, the present invention provides a method 
of modulating the specificity of a peptide such that the 
peptide can be used as a recognition unit to screen a 

25 plurality of polypeptides, thus identifying polypeptides that 
have a functional domain. In a specific embodiment, 
specificity is generic so as to provide for the identification 
of polypeptides having a functional domain that varies in 
sequence from that of the target functional domain known to 

3 0 bind the recognition unit under conditions of high 

specificity. In a particular embodiment, the method comprises 
forming a tetravalent complex of the biotinylated peptide and 
streptavidin-alkaline phosphatase prior to use for screening 
an expression library. 
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5.3. Kits 

The present invention is also directed to an assay 
kit which can be useful in the screening of drug candidates. 
In a particular embodiment of the present invention, an assay 
5 kit is contemplated which comprises in one or more containers 
(a) a polypeptide containing a functional domain of interest; 
and (b) a recognition unit having a selective affinity for the 
polypeptide. The kit optionally further comprises a detection 
means for determining the presence of a polypeptide- 

10 recognition unit interaction or the absence thereof. 

In a specific embodiment, either the polypeptide 
containing the functional domain or the recognition unit is 
labeled. A wide range of labels can be used to advantage in 
the present invention, including but not limited to 

15 conjugating the recognition unit to biotin by conventional 
means. Alternatively, the label may comprise a fluorogen, an 
enzyme, an epitope, a chromogen, or a radionuclide. 
Preferably, the biotin is conjugated by covalent attachment to 
either the polypeptide or the recognition unit. The 

20 polypeptide or, preferably, the recognition unit is 

immobilized on a solid support. The detection means employed 
to detect the label will depend on the nature of the label and 
can be any known in the art, e.gr., film to detect a 
radionuclide; an enzyme substrate that gives rise to a 

25 detectable signal to detect the presence of an enzyme; 
antibody to detect the presence of an epitope, etc. 

A further embodiment of the assay kit of the present 
invention includes the use of a plurality of polypeptides, 
each polypeptide containing a functional domain of interest. 

3 0 The assay kit further comprises at least one recognition unit 
having a selective affinity for each of the plurality of 
polypeptides and a detection means for determining the 
presence of a polypeptide-recognition unit interaction or the 
absence thereof. 

35 A kit is provided that comprises, in one or more 

containers, a first molecule comprising an SH3 domain and a 
second molecule that binds to the SH3 domain, i.e., a 
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recognition unit, where the SH3 domain is a novel SH3 domain 
identified by the methods of the present invention. 

In a specific embodiment, the present invention 
provides an assay kit comprising in one or more containers: 
5 (a) a purified polypeptide containing a functional 

domain of interest , in which the functional domain of is a 
domain selected from the group consisting of an SHI, SH2 , SH3 , 
PH, PTB, LIM, armadillo, Notch/ankyrin repeat, zinc finger, 
leucine zipper, and helix-turn-helix; and 
10 (b) a purified recognition unit having a selective 

binding affinity for said functional domain in said 
polypeptide • 

In the above assay kit, the polypeptide may comprise 

an amino acid sequence selected from the group consisting of 
15 SEQ ID NOs: 8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 

192, 194, 196, 198, 200, 221, 113-115, 118-121, 125-128, 133- 

139, 204-218, and 219. 

In the above assay kit, the polypeptide may comprise 

an amino acid sequence selected from the group consisting of 
20 SEQ ID NOs:6, 14, 16, 26, 28, 34, 36, 112, 116, 117, 122-124, 

129-132, and 140. 

In other embodiments of the above-described assay 

kit, the recognition unit may be a peptide. The recognition 

unit may be labeled with e.g., an enzyme, an epitope, a 
25 chromogen, or biotin. 

In another specific embodiment, the present 

invention provides an assay kit comprising in containers: 
(a) a plurality of purified polypeptides, each 

polypeptide in a separate container and each polypeptide 
30 containing a functional domain of interest in which the 

functional. domain of interest is a domain selected from the 

group consisting of an SHI, SH2, SH3, PH, PTB, LIM, armadillo, 

Notch/ankyrin repeat, zinc fingers, leucine zippers, and 

helix-turn-helix; and 
35 (b) at least one recognition unit having a 

selective binding affinity for said functional domain in each 

of said plurality of polypeptides. 
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The present invention also provides an assay kit 
comprising in one or more containers: 

(a) a plurality of purified polypeptides , each 
polypeptide in a separate container and each polypeptide 

5 containing an SH3 domain; and 

(b) at least one peptide having a selective 
affinity for the SH3 domain in each of said plurality of 
polypeptides. 

The present invention also provides a kit comprising 
10 a plurality of purified polypeptides comprising a functional 
domain of interest, each polypeptide in a separate container , 
and each polypeptide having a functional domain of a different 
sequence but capable of displaying the same binding 
specificity. 

15 in the above-described kits, the polypeptides may 

have an amino acid sequence selected from the group consisting 
Of: SEQ ID N0s:8, 10, 12, 18, 20, 22, 24, 30, 32, 38, 40, 190, 
192, 194, 196, 198, 200, 221. 

In the above-described kits, the functional domain 
20 may be an SH3 domain. 

The molecular components of the kits are preferably 

purified. 

The kits of the present invention may be used in the 
methods for identifying new drug candidates and determining 
25 the specificities thereof that are described in Section 5.4. 

5.4. Assays for the Identification of Potential Drug 

Candidates and Determining the Specif icitv Thereof 

The present invention also provides methods for 
30 identifying potential drug candidates (and lead compounds) and 
determining the specificities thereof. For example, knowing 
that a polypeptide with a functional domain of interest and a 
recognition unit, e.g., a binding peptide, exhibit a selective 
affinity for each other, one may attempt to identify a drug 
35 that can exert an effect on the polypeptide-recognition unit 
interaction, e.g., either as an agonist or as an antagonist 
(inhibitor) of the interaction. With this assay, one can 
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screen a collection of candidate "drugs" for the one 
exhibiting the most desired characteristic , e.gr., the most 
efficacious in disrupting the interaction or in competing with 
the recognition unit for binding to the polypeptide. 
5 Alternatively, one may utilize the different 

selectivities that a particular recognition unit may exhibit 
for different polypeptides bearing the same, similar, or 
functionally equivalent functional domains. Thus, one may 
tailor the screen to identify drug candidates that exhibit 

10 more selective activities directed to specific polypeptide- 
recognition unit interactions, among the "panel" of 
possibilities. Thus, for example, a drug candidate may be 
screened to identify the presence or absence of an effect on 
particular binding interactions, potentially leading to 

15 undesirable side effects. 

Indeed, an intriguing application of the present 
invention is described as follows. A known antiviral agent, 
FIAU (a halogenated nucleoside analog) , is effective at given 
dosages against the virus that causes hepatitis B. This 

2 0 compound is suspected of causing toxic side effects, however, 
which give rise to liver failure in certain patients to whom ' 
the drug is administered. According to the present invention, 
an assay is provided which can be used to develop a new 
generation of FIAU-derived drug that maintains its 

25 effectiveness against viral replication while reducing liver 
toxicity. Such an assay is provided by choosing FIAU as a 
recognition unit having a selective affinity for a polypeptide 
present in the hepatitis B virus or a cell infected with the 
virus. This polypeptide or family of polypeptides having the 

30 functional domain of interest is obtained by allowing the 
chosen recognition unit, FIAU, to come into contact with an 
expression library comprised of the hepatitis B virus genome 
and/or a cDNA expression library of infected cells, according 
to the methods of the present invention. 

35 Likewise, the chosen recognition unit is allowed to 

come into contact with a plurality of polypeptides obtained 
from a sample of a human liver extract or of noninfected 
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hepatocytes. In this manner, a "panel" of polypeptides each 
of which exhibits a selective affinity for the chosen 
recognition unit is identified. As described above, this 
panel is used to determine the activities of drug (FIAU) 
5 homologs, analogs, or derivatives in terms of, say, selective 
inhibition of viral polypeptide-FIAU interaction versus liver 
polypeptide-FIAU interaction. Hence, those drug homologs, 
analogs, or derivatives that maintain a selective affinity for 
the viral polypeptide (or infected cell polypeptide) while 

10 failing to interact with or having a minimal binding affinity 
for liver polypeptides (and, hence, have reduced toxicity in 
the liver due to elimination of undesirable molecular 
interactions) can be identified and selected. Additional 
iterations of this process can be performed if so desired. 

15 Therefore, the present invention contemplates an 

assay for screening a drug candidate comprising: (a) allowing 
at least one polypeptide comprising a functional domain of 
interest to come into contact with at least one recognition 
unit having a selective affinity for the polypeptide in the 

20 presence of an amount of a drug candidate, such that the 
polypeptide and the recognition unit are capable of 
interacting when brought into contact with one another in the 
absence of said drug candidate, and in which the functional 
domain of interest is a domain selected from the group 

25 consisting of an SHI, SH2, SH3, PH, PTB, LIM, armadillo, 

Notch/ ankyr in repeat, zinc finger, leucine zipper, and helix- 
turn-helix; and (b) determining the effect, if any, of the 
presence of the amount of the drug candidate on the 
interaction of the polypeptide with the recognition unit. 

30 In one embodiment, the effect of the drug candidate 

upon multiple, different interacting polypeptide-recognition 
unit pairs is determined in which at least some of said 
polypeptides have a functional domain that differs in sequence 
but is capable of displaying the same binding specificity as 

35 the functional domain in another of said polypeptides. 

In another embodiment, at least one of said at least 
one polypeptide or recognition unit contains a consensus 
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functional domain and consensus recognition unit, 
respectively . 

In another embodiment, the drug candidate is an 
inhibitor of the polypeptide-recognition unit interaction that 
5 is identified by detecting a decrease in the binding of 
polypeptide to recognition unit in the presence of such 
inhibitor. 

In another embodiment, said polypeptide is a 
polypeptide containing an SH3 domain produced by a method 
10 comprising: 

(i) screening a peptide library with an SH3 domain 
to obtain one or more peptides that bind the SH3 domain; 

(ii) using one of the peptides from step (i) to 
screen a source of polypeptides to identify one or more 

15 polypeptides containing an SH3 domain; 

(iii) determining the amino acid sequence of the 
polypeptides identified in step (ii) ; and 

(iv) producing the one or more novel polypeptides 
containing an SH3 domain. 

20 In another embodiment, said polypeptide is a 

polypeptide containing an SH3 domain produced by a method 
comprising: 

(i) screening a peptide library with an SH3 domain 
to obtain a plurality of peptides that bind the SH3 domain; 
25 (ii) determining a consensus sequence for the 

peptides obtained in step (i) ; 

(iii) producing a peptide comprising the consensus 

sequence ; 

(iv) using the peptide comprising the consensus 

3 0 sequence to screen a source of polypeptides to identify one or 
more polypeptides containing an SH3 domain; 

(v) determining the amino acid sequence of the 
polypeptides identified in step (iv) ; and 

(vi) producing the one or more polypeptides 
35 containing an SH3 domain. 

In a preferred embodiment, the effect of the drug 
candidate upon multiple, different interacting polypeptide- 
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recognition unit pairs is determined in which preferably at 
least some (e.g., at least 2, 3, 4, 5, 7, or 10) of said 
polypeptides have functional domains that vary in sequence yet 
are capable of displaying the same binding specificity, i.e., 
5 binding to the same recognition unit. In another specific 
embodiment, at least one of said polypeptides and/or 
recognition units contain a consensus functional domain and 
recognition unit, respectively (and thus are not known to be 
naturally expressed proteins). In one embodiment, the 

10 polypeptide is a novel polypeptide identified by the methods 
of the present invention. In a specific embodiment, an 
inhibitor of the polypeptide-^recognition unit interaction is 
identified by detecting a decrease in the binding of 
polypeptide to recognition unit in the presence of such 

15 inhibitor. 

A common problem in the development of new drugs is 
that of identifying a single, or a small number, of compounds 
that possess a desirable characteristic from among a 
background of a large number of compounds that lack that 

20 desired characteristic. This problem arises both in the 
testing of compounds that are natural products from plant, 
animal, or microbial sources and in the testing of man-made 
compounds. Typically, hundreds, or even thousands, of 
compounds are randomly screened by the use of in vitro assays 

25 such as those that monitor the compound 1 s effect on some 
enzymatic activity, its ability to bind to a reference 
substance such as a receptor or other protein, or its ability 
to disrupt the binding between a receptor and its ligand. 

The compounds which pass this original screening 

30 test are known as "lead" compounds. These lead compounds are 
then put through further testing, including, eventually, in 
vivo testing in animals and humans, from which the promise 
shown by the lead compounds in the original in vitro tests is 
either confirmed or refuted. See Remington 1 s Pharmaceutical 

35 Sciences . 1990, A.R. Gennaro, ed. , ^Chapter 8, pages 60-62, 
Mack Publishing Co., Easton, PA; Ecker and Crooke, 1995, 
Bio/Technology 13:351-360. 
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There is a continual need for new compounds to be 
tested in the in vitro assays that make up the first testing 
step described above. There is also a continual need for new 
assays by which the pharmacological activities of these 
5 compounds may be tested. It is an object of the present 
invention to provide such new assays to determine whether a 
candidate compound is capable of affecting the binding between 
a polypeptide containing a functional domain and a recognition 
unit that binds to that functional domain. In piarticular, it 

10 is an object of the present invention to provide polypeptides , 
particularly novel ones, containing functional domains and 
their corresponding recognition units for use in the above- 
described assays. The use of these polypeptides greatly 
expands the number of assays that may be used to screen 

15 potential drug candidates for useful pharmacological 

activities (as well as to identify potential drug candidates 
that display adverse or undesirable pharmacological 
activities) . In one particular embodiment of the present 
invention, the polypeptides contain an SH3 domain. 

20 In one embodiment of the present invention, such 

polypeptides are identified by a method comprising: using a 
recognition unit that is capable of binding to a predetermined 
functional domain to screen a source of polypeptides, thus 
identifying novel polypeptides containing the functional 

25 domain or a similar functional domain. 

In a particular embodiment of the above-described 
method, the novel polypeptide comprises an SH3 domain and is 
obtained by: 

(i) screening a peptide library with the SH3 domain 
30 to obtain one or more peptides that bind the SH3 domain; 

(ii) using one of the peptides from step (i) , 
preferably in the form of a multivalent complex, to screen a 
source of polypeptides to identify one or more novel 
polypeptides containing SH3 domains; 

35 (iii) determining the amino acid sequence of the 

polypeptides identified in step (ii) ; and 
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(iv) producing the one or more novel polypeptides 
containing SH3 domains. 

In another embodiment of the above-described method , 
the novel polypeptide containing an SH3 domain is obtained by: 
5 (i) screening a peptide library with the SH3 domain 

to obtain peptides that bind the SH3 domain; 

(ii) determining a consensus sequence for the 
peptides obtained in step (i) ; 

(iii) producing a peptide comprising the consensus 

10 sequence; 

(iv) using the peptide comprising the consensus 
sequence to screen a source of polypeptides to identify one or 
more novel polypeptides containing SH3 domains; 

(v) determining the amino acid sequence of the novel 
15 polypeptides identified in step (iv) ; and 

(vi) producing the one or more novel polypeptides 
containing SH3 domains. 

One of ordinary skill in the art will recognize that 
it will not always be necessary to utilize the entire novel 

20 polypeptide containing the SH3 domain in the assays described 
herein. Often , a portion of the polypeptide that contains the 
SH3 domain will be sufficient, e.g., a glutathione S- 
transf erase (GST)-SH3 domain fusion protein. See Figure 10A 
and 10B for a depiction of the portions of the exemplary novel 

25 polypeptides that contain SH3 domains. 

A typical assay of the present invention consists of 
at least the following components: (1) a molecule (e.gr., 
protein or polypeptide) comprising a functional domain; (2) a 
recognition unit that selectively binds to the functional 

30 domain; (3) a candidate compound, suspected of having the 

capacity to affect the binding between the protein containing 
the functional domain and the recognition unit. The assay 
components may further comprise (4) a means of detecting the 
binding of the protein comprising the functional domain and 

35 the recognition unit. Such means Can be e.gr., a detectable 
label affixed to the protein comprising the functional domain, 
the recognition unit, or the candidate compound. 
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In a specific embodiment , the protein comprising the 
functional domain is a novel protein discovered by the methods 
of the present invention. 

In another specific embodiment , the invention 
5 provides a method of identifying a compound that affects the 
binding of a molecule comprising a functional domain and a 
recognition unit that selectively binds to the functional 
domain comprising: 

(a) contacting the molecule comprising the 

10 functional domain and the recognition unit under conditions 
conducive to binding in the presence of a candidate compound 
and measuring the amount of binding between the molecule and 
the recognition unit ; 

(b) comparing the amount of binding in step (a) with 
15 the amount of binding known or determined to occur between the 

molecule and the recognition unit in the absence of the 
candidate compound, where a difference in the amount of 
binding between step (a) and the amount of binding known or 
determined to occur between the molecule and the recognition 

20 unit in the absence of the candidate compound indicates that 
the candidate compound is a compound that affects the binding 
of the molecule comprising a functional domain and the 
recognition unit. In a specific embodiment, the molecule 
comprising the functional domain is a novel protein discovered 

25 by the methods of the present invention. In another specific 
embodiment, the functional domain is an SH3 domain. 

In one embodiment, the assay comprises allowing the 
polypeptide containing an SH3 domain to contact a recognition 
unit that selectively binds to the SH3 domain in the presence 

30 and in the absence of the candidate compound under conditions 
such that binding of the recognition unit to the protein 
containing an SH3 domain will occur unless that binding is 
disrupted or prevented by the candidate compound. By 
detecting the amount of binding of the recognition unit to the 

35 protein containing an SH3 domain in the presence of the 

candidate compound and comparing that amount of binding to the 
amount of binding of the recognition unit to the protein or 
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polypeptide containing an SH3 domain in the absence of the 
candidate compound, it is possible to determine whether the 
candidate compound affects the binding and thus is a useful 
lead compound for the modulation of the activity of proteins 
5 containing the SH3 domain. The effect of the candidate 
compound may be to either increase or decrease the binding. 

One version of an assay suitable for use in the 
present invention comprises binding the protein containing an 
SH3 domain to a solid support such as the wells of a 
10 microtiter plate. The wells contain a suitable buffer and 

other substances to ensure that conditions in the wells permit 
the binding of the protein or polypeptide containing an SH3 
domain to its recognition unit. The recognition unit and a 
candidate compound are then added to the wells. The 
15 recognition unit is preferably labeled, e.g., it might be 

biotinylated or labeled with a radioactive moiety, or it might 
be linked to an enzyme, e.g., alkaline phosphatase. After a 
suitable period of incubation, the wells are washed to remove 
any unbound recognition unit and compound. If the candidate 
20 compound does not interfere with the binding of the protein or 
polypeptide containing an SH3 domain to the labeled 
recognition unit, the labeled recognition unit will bind to 
the protein or polypeptide containing an SH3 domain in the 
well. This binding can then be detected. If the candidate 
25 compound interferes with the binding of the protein or 
polypeptide containing an SH3 domain and the labeled 
recognition unit, label will not be present in the wells, or 
will be present to a lesser degree than is the case when 
compared to control wells that contain the protein or 
30 polypeptide containing an SH3 domain and the labeled 

recognition unit but to which no candidate compound is added. 
Of course, it is possible that the presence of the candidate 
compound will increase the binding between the protein or 
polypeptide containing an SH3 domain and the labeled 
35 recognition unit. Alternatively, the recognition unit can be 
affixed to a solid substrate during the assay. Functional 
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domains other than SH3 domains and their corresponding 
recognition units can also be used. 

In a specific embodiment of the above-described 
method , the protein or polypeptide containing an SH3 domain is 
5 a novel protein or polypeptide containing an SH3 domain that 
has been identified by the methods of the present invention. 

5.5. Use of Polypeptides Containing Functional 

Domains to Discover Polypeptides Involved in 
Pharmacological Activities 



10 



15 



20 



25 



30 



35 



Using the methods of the present invention , it is 
possible to identify and isolate large numbers of polypeptides 
containing functional domains , e.g., SH3 domains. Using these 
polypeptides, one can construct a matrix relating the 
polypeptides to an array of candidate drug compounds. For 
example , Table 1 shows such a matrix. 

TABLE 1 

A B C D E F G H I J 

1 

2 x x x 

3 
4 

5 X 
6 

7 X X 

8 

9 X 
10 

In Table 1, the columns headed by letters at the top 
of the table represent different polypeptides containing SH3 
domains (preferably novel polypeptides identified by the 
methods of the invention) . The rows numbered along the left 
side of the table represent recognition units with various 
specificity to SH3 domains. For each candidate drug compound , 
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a table such as Table 1 is generated from the results of 
binding assays. An X placed at the intersection of a 
particular numbered row and lettered column represents a 
positive assay for binding, I.e., the candidate drug compound 
5 affected the binding of the recognition unit of that 

particular row to the SH3 domain of that particular column. 

Such data as that illustrated above is used to 
determine whether candidate drug compounds display or are at 
risk of displaying desirable or undesirable physiological or 

10 pharmacological activities. For example, in Table 1, the drug 
compound inhibits the binding of recognition unit 2 to the SH3 
domains of polypeptides B, D, and H; the compound inhibits the 
binding of recognition unit 5 to the SH3 domain of polypeptide 
F; the compound inhibits the binding of recognition unit 7 to 

15 the SH3 domains of polypeptides C and H; and the compound 
inhibits the binding of recognition unit 9 to the SH3 domain 
of polypeptide A. 

If interaction with polypeptide H leads to the 
desirable physiological or pharmacological activity, then this 

20 drug candidate might be a good lead. However, interaction 
with polypeptides A, B, C, D, and F would need to be 
evalutated for potential side effects. 

As the maps are generated and pharmacological 
effects observed, the maps will allow strategic assessment of 

25 the specificity necessary to obtain the desired 

pharmacological effect. For example, if compounds 2 and 7 are 
able to affect some pharmacological activity, while compounds 
5 and 9 do not affect that activity, then polypeptide H is 
likely to be involved in that pharmacological activity. For 

3 0 example, if compounds 2 and 7 were both able to inhibit mast 
cell degranulation, while compounds 5 and 9 did not, it is 
likely that polypeptide H is involved in mast cell 
degranulation. 

Accordingly, the present invention provides a method 

35 of utilizing the polypeptides comprising functional domains of 
the present invention in an assay to determine the 
Participation of those polypeptides in pharmacological 
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activities. In a particular embodiment, the polypeptides 
comprise SH3 domains. 

In another embodiment, the method comprises: 

(a) contacting a drug candidate with a molecule 

5 comprising a functional domain under conditions conducive to 
binding, and detecting or measuring any specific binding that 
occurs ; and 

(b) repeating step (a) with a plurality of different 
molecules, each comprising a different functional domain but 

10 capable of binding to a single predetermined recognition unit 
under appropriate conditions. 

Preferably, at least one of said molecules is a 
novel polypeptide identified by the methods of the present 
invention. In a specific embodiment, the molecules comprise 

15 the SH3 domains of Src, Abl, Cortactin, Phospholipase C7, Nek; 
Crk, p53bp2, Amphiphysin, Grb2, RasGap, or Phosphatidyl- 
inositol 3 1 kinase. 

The present invention also provides a method of 
determining the potential pharmacological activities of a 

20 molecule comprising: 

(a) contacting the molecule with a compound 
comprising a functional domain under conditions conducive to 
binding; 

(b) detecting or measuring any specific binding that 
25 occurs; and 

(c) repeating steps (a) and (b) with a plurality of 
different compounds, each compound comprising a functional 
domain of different sequence but capable of displaying the 
same binding specificity. 

30 In a specific embodiment the functional domain is an 

SH3 domain. 

In another embodiment, the compounds comprise the 
SH3 domains of Src, Abl, Cortactin, Phospholipase C7, Nek, 
Crk, p53bp2, Amphiphysin, Grb2, RasGap, or Phosphatidyl- 
35 inositol 3 9 kinase. 

The present invention also provides a method of 
identifying a compound that affects the binding of a molecule 

- 59 - 



EV245495899US 



comprising a functional domain to a recognition unit that 
selectively binds to the functional domain comprising: 

(a) contacting the molecule comprising the 
functional domain and the recognition unit under conditions 

5 conducive to binding in the presence of a candidate compound 
and measuring the amount of binding between the molecule and 
the recognition unit and in which the functional domain of 
interest is a domain selected from the group consisting of an 
SHI, SH2, SH3, PH, PTB, LIM, armadillo, Notch/ankyrin repeat, 
10 zinc finger, leucine zipper, and helix- turn-helix; 

(b) comparing the amount of binding in step (a) with 
the amount of binding known or determined to occur between the 
molecule and the recognition unit in the absence of the 
candidate compound, where a difference in the amount of 

15 binding between step (a) and the amount of binding known or 
determined to occur between the molecule and the recognition 
unit in the absence of the candidate compound indicates that 
the candidate compound is a compound that affects the binding 
of the molecule comprising a functional domain and the 

20 recognition unit* 

In a specific embodiment, the functional domain is 
an SH3 domain. 

5.6* Use of More Than One Recognition Unit Simultaneously 
25 it has been found that when screening a source of 

polypeptides with a recognition unit, it is possible to use 
more than one recognition unit at the same time. In 
particular, it has been found that as many as five different 
recognition units may be used simultaneously to screen a 
30 source of polypeptides. 

In particular, when the recognition units are 
biotinylated peptides and the source of polypeptides is a cDNA 
expression library, the steps of preconjugation of the 
biotinylated peptides to streptavidin-alkaline phosphatase as 
35 well as the steps involved in screening the cDNA expression 
library may be carried out in essentially the same manner as 
is done when a single biotinylated peptide is used as a 
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recognition unit. See Section 6.1 for details. The key 
difference when using more than one biotinylated peptide at a 
time is that the peptides are combined either before or at the 
step where they are placed in contact with the polypeptides 
5 from which selection occurs. 

In an embodiment employing a bacteriophage 
expression library to express the polypeptides , when the 
positive clones are worked up to the level of isolated 
plagues, the clonal bacteriophage from the isolated plagues 
10 may be tested against each of the biotinylated peptides 

individually, in order to determine to which of the several 
peptides that were used as recognition units in the primary 
screen the phage are actually binding. 

15 5.7. Use of Recognition Units from 

Known Amino Acid Sequences ^_ 

In many cases it may not be necessary to screen a 

collection of substances, e.g., a peptide library, in order to 

obtain a recognition unit for a given functional domain. In 

20 the case of peptide recognition units, for example, it is 
sometimes possible to identify a recognition unit by 
inspection of known amino acid sequences. Stretches of these 
amino acid sequences that resemble known binding sequences for 
the functional domain can be synthesized and screened against 

25 a source of polypeptides in order to obtain a plurality of 
polypeptides comprising the given functional domain. 

Prior to the disclosure of the present invention of 
methods of preparing recognition units having generic 
specificity, it would have been thought fruitless to pursue 

30 this approach. The expectation would have been that a 

recognition unit, chosen from published amino acid sequences 
as described above, would have been useful, at best, to 
identify a single protein containing a functional domain. 

35 
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-5.8. Isolation and Expression of Nucleic Acids Encoding 

Polype ptides Comprising a Functional Domain 

In particular aspects , the invention provides amino 

acid sequences of polypeptides comprising functional domains , 

5 preferably human polypeptides , and fragments and derivatives 

thereof which comprise an antigenic determinant (i.e., can be 

recognized by an antibody) or which are functionally active, 

as well as nucleic acid sequences encoding the foregoing. 

"Functionally active" material as used herein refers to that 

10 material displaying one or more functional activities, e.g., a 

biological activity, antigenicity (capable of binding to an 

antibody) immunogenicity , or comprising a functional domain 

that is capable of specific binding to a recognition unit. 

In specific embodiments, the invention provides fragments of 

15 polypeptides comprising a functional domain consisting of at 

least 40 amino acids, or of at least 75 amino acids. Nucleic 

acids encoding the foregoing are provided. Functional 

fragments of at least 10 or 20 amino acids are also provided. 

In other specific embodiments, the invention 

2o P r °vides nucleotide sequences and subsequences encoding 

polypeptides comprising a functional domain, preferably human 

polypeptides, consisting of at least 25 nucleotides, at least 

50 nucleotides, or at least 150 nucleotides. Nucleic acids 

encoding fragments of the polypeptides comprising a functional 

25 domain are provided, as well as nucleic acids complementary to 

and capable of hybridizing to such nucleic acids. In one 

embodiment, such a complementary sequence may be complementary 

to a cDNA sequence encoding a polypeptide comprising a 

functional domain of at least 25 nucleotides, or of at least 

30 100 nucleotides. In a preferred aspect, the invention 

utilizes cDNA sequences encoding human polypeptides comprising 

a functional domain or a portion thereof. 

Any eukaryotic cell can potentially serve as the 

nucleic acid source for the molecular cloning of polypeptides 

3S comprising a functional domain. The DNA may be obtained by 

standard procedures known in the art (e.g., a DNA "library") 

by cDNA cloning, or by the cloning of genomic DNA, or 
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fragments thereof, purified from the desired cell (see, for 
example Sambrook et al., 1989, Molecular Cloning, A Laboratory 
Manual, Cold Spring Harbor Laboratory, 2d. Ed., Cold Spring 
Harbor, New York; Glover, D.M. (ed.), 1985, DNA Cloning: A 
5 Practical Approach, MRL Press, Ltd., Oxford, U.K. Vol. I, II.) 
Clones derived from genomic DNA may contain regulatory and 
intron DNA regions in addition to coding regions; clones 
derived from cDNA will contain only exon sequences. Whatever 
the source, the gene encoding a polypeptide comprising a 

10 functional domain should be molecularly cloned into a suitable 
vector for propagation of the gene. 

In the molecular cloning of the gene from genomic 
DNA, DNA fragments are generated, some of which will encode 
the desired gene. The DNA may be cleaved at specific sites 

15 using various restriction enzymes. Alternatively, one may use 
DNAse in the presence of manganese to fragment the DNA, or the 
DNA can be physically sheared, as for example, by sonication. 
The linear DNA fragments can then be separated according to 
size by standard techniques, including but not limited to, 

20 agarose and polyacryl amide gel electrophoresis and column 
chromatography. 

Once a gene encoding a particular polypeptide 
comprising a functional domain has been isolated from a first 
species, it is a routine matter to isolate the corresponding 

25 gene from another species. identification of the specific DNA 
fragment from another species containing the desired gene may 
be accomplished in a number of ways. For example, if an 
amount of a portion of a gene or its specific RNA from the 
first species, or a fragment thereof e.g., the functional 

30 domain, is available and can be purified and labeled, the 
generated DNA fragments from another species may be screened 
by nucleic acid hybridization to the labeled probe (Benton, W. 
and Davis, R. , 1977, Science 196, 180; Grunstein, M. And 
Hogness, D., 1975, Proc. Natl. Acad. Sci. U.S.A. 72, 3961). 

35 Those DNA fragments with substantial homology to the probe 

will hybridize. In a preferred embodiment, PCR using primers 
that hybridize to a known sequence of a gene of one species 
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can Be used to amplify the homolog of such gene in a different 
species. The amplified fragment can then be isolated and 
inserted into an expression or cloning vector. It is also 
possible to identify the appropriate fragment by restriction 
5 enzyme digestion (s) and comparison of fragment sizes with 

those expected according to a known restriction map if such is 
available. Further selection can be carried out on the basis 
of the properties of the gene. Alternatively , the presence of 
the gene may be detected by assays based on the physical , 

10 chemical, or immunological properties of its expressed 
product. For example, cDNA clones, or DNA clones which 
hybrid-select the proper mRNAs, can be selected which produce 
a protein that, e.gr., has similar or identical electrophoretic 
migration, isolectric focusing behavior, proteolytic digestion 

15 maps, in vitro aggregation activity ("adhesiveness") or 

antigenic properties as known for the particular polypeptide 
comprising a functional domain from the first species. If an 
antibody to that particular polypeptide is available, 
corresponding polypeptide from another species may be 

20 identified by binding of labeled antibody to the putatively 
polypeptide synthesizing clones, in an ELISA (enzyme-linked 
immunosorbent assay) -type procedure. 

Genes encoding polypeptides comprising a functional 
domain can also be identified by mRNA selection by nucleic 

25 acid hybridization followed by in vitro translation. In this 
procedure, fragments are used to isolate complementary mRNAs 
by hybridization. Such DNA fragments may represent available, 
purified DNA of genes encoding polypeptides comprising a 
functional domain of a first species. Immunoprecipitation 

30 analysis or functional assays (e.gr., ability to bind to a 

recognition unit) of the in vitro translation products of the 
isolated mRNAs identifies the mRNA and, therefore, the 
complementary DNA fragments that contain the desired 
sequences. In addition, specific mRNAs may be selected by 

35 adsorption of polysomes isolated from cells to immobilized 
antibodies specifically directed against polypeptides 
comprising a functional domain. A radiolabelled cDNA of a 
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gene encoding a polypeptide comprising a functional domain can 
be synthesized using the selected mRNA (from the adsorbed 
polysomes) as a template. The radiolabelled mRNA or cDNA may 
then be used as a probe to identify the DNA fragments that 
5 represent the gene encoding the polypeptide comprising a 

functional domain of another species from among other genomic 
DNA fragments. In a specific embodiment, human homologs of 
mouse genes are obtained by methods described above. In 
various embodiments , the human homolog is hybridizable to the 

10 mouse homolog under conditions of low, moderate, or high 

stringency. By way of example and not limitation, procedures 
using such conditions of low stringency are as follows (see 
also Shilo and Weinberg, 1981 , Proc. Natl. Acad. Sci. USA 
78:6789-6792): Filters containing DNA are pretreated for 6 h 

15 at 40°C in a solution containing 35% formamide, 5X SSC, 50 mM 
Tris-HCl (pH 7.5), 5 mM EDTA, 0.1% PVP, 0.1% Ficoll, 1% BSA, 
and 500 ng/ml denatured salmon sperm DNA. Hybridizations are 
carried out in the same solution with the following 
modifications: 0.02% PVP, 0.02% Ficoll, 0.2% BSA, 100 fig/ml 

2 0 salmon sperm DNA, 10% (wt/vol) dextran sulfate, and 5-20 X 10 6 
cpm 32 P-labeled probe is used. Filters are incubated in 
hybridization mixture for 18-20 h at 40 °C, and then washed for 
1.5 h at 55°C in a solution containing 2X SSC, 25 mM Tris-HCl 
(pH 7.4), 5 mM EDTA, and 0.1% SDS. The wash solution is 

25 replaced with fresh solution and incubated an additional 1.5 h 
at 60 °C. Filters are blotted dry and exposed for 
autoradiography. If necessary, filters are washed for a third 
time at 65-68°C and reexposed to film. Other conditions of 
low stringency which may be used are well known in the art 

30 (e.g., as employed for cross-species hybridizations). 

By way of example and not limitation, procedures 
using conditions of high stringency are as follows: 
Prehybridization of filters containing DNA is carried out for 
8 h to overnight at 65 °C in buffer composed of 6X SSC, 50 mM 

35 Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% 
BSA, and 500 /ig/ml denatured salmon sperm DNA. Filters are 
hybridized for 48 h at 65°C in prehybridization mixture 
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containing 100 /xg/ml denatured salmon sperm DNA and 5-20 X io 6 
cpm of 32 P-labeled probe. Washing of filters is done at 37 °C 
for 1 h in a solution containing 2X SSC, 0,01% PVP r 0.01% 
Ficoll, and 0.01% BSA. This is followed by a wash in 0.1X SSC 
5 at 50 °C for 45 min before autoradiography. Other conditions 
of high stringency which may be used are well known in the 
art. 

The identified and isolated gene encoding a 
polypeptide comprising a functional domain can then be 

10 inserted into an appropriate cloning vector. A large number 
of vector-host systems known in the art may be used. Possible 
vectors include, but are not limited to # plasmids or modified 
viruses, but the vector system must be compatible with the 
host cell used. Such vectors include, but are not limited to, 

15 bacteriophages such as lambda derivatives, or plasmids such as 
PBR322 or pUC plasmid derivatives. The insertion into a 
cloning vector can, for example, be accomplished by ligating 
the DNA fragment into a cloning vector which has complementary 
cohesive termini. However, if the complementary restriction 

20 sites used to fragment the DNA are not present in the cloning 
vector, the ends of the DNA molecules may be enzymatically 
modified. Alternatively, any site desired may be produced by 
ligating nucleotide sequences (linkers) onto the DNA termini; 
these ligated linkers may comprise specific chemically 

25 synthesized oligonucleotides encoding restriction endonuclease 
recognition sequences. In an alternative method, the cleaved 
vector and gene may be modified by homopolymeric tailing. 
Recombinant molecules can be introduced into host cells via 
transformation , transf ect ion , infection , electroporat ion , 

30 etc. , so that many copies of the gene sequence are generated. 

In an alternative method, the desired gene may be 
identified and isolated after insertion into a suitable 
cloning vector in a "shot gun" approach. Enrichment for the 
desired gene, for example, by size fract ionization, can be 

35 done before insertion into the clohing vector. 

In specific embodiments, transformation of host 
Cells with recombinant DNA molecules that incorporate the 
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isolated gene, cDNA, or synthesized DNA sequence enables 
generation of multiple copies of the gene. Thus, the gene may 
be obtained in large quantities by growing transf ormants, 
isolating the recombinant DNA molecules from the transf ormants 
5 and, when necessary, retrieving the inserted gene from the 
isolated recombinant DNA. 

The nucleic acid coding for a polypeptide comprising 
a functional domain of the invention can be inserted into an 
appropriate expression vector, i.e., a vector which contains 

10 the necessary elements for the transcription and translation 
of the inserted protein-coding sequence. The necessary 
transcriptional and translational signals can also be supplied 
by the native gene encoding the polypeptide and/ or its 
flanking regions. A variety of host-vector systems may be 

15 utilized to express the protein-coding sequence. These 

include but are not limited to mammalian cell systems infected 
with virus (e.g. , vaccinia virus, adenovirus, etc.); insect 
cell systems infected with virus (e.g., baculovirus) ; 
microorganisms such as yeast containing yeast vectors, or 

20 bacteria transformed with bacteriophage, DNA, plasmid DNA, or 
cosmid DNA. The expression elements of vectors "vary in their 
strengths and specificities. Depending on the host-vector 
system utilized, any one of a number of suitable transcription 
and translation elements may be used. 

25 Any of the methods previously described for the 

insertion of DNA fragments into a vector may be used to 
construct expression vectors containing a chimeric gene 
consisting of appropriate transcriptional/translational 
control signals and the protein coding sequences. These 

30 methods may include in vitro recombinant DNA and synthetic 
techniques and in vivo recombinants (genetic recombination) . 
Expression of nucleic acid sequence encoding a protein or 
peptide fragment may be regulated by a second nucleic acid 
sequence so that the protein or peptide is expressed in a host 

35 transformed with the recombinant DNA molecule. For example, 
expression of a protein may be controlled by any 
promoter/ enhancer element known in the art. Promoters which 
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may be used to control gene expression include, but are not 
limited to, the SV40 early promoter region (Benoist and 
Chambon, 1981, Nature 290, 304-310), the promoter contained in 
the 3 1 long terminal repeat of Rous sarcoma virus (Yamamoto, 
5 et al., 1980, Cell 22, 787-797), the herpes thymidine kinase 
promoter (Wagner et al. , 1981, Proc, Natl. Acad. Sci. U.S.A. 
78, 1441-1445), the regulatory sequences of the 
metallothioneih gene (Brinster et al. , 1982, Nature 296, 39- 
42) ; prokaryotic expression vectors such as the /3-lactamase 

10 promoter (Villa-Kamarof f , et al. , 1978, Proc. Natl. Acad. Sci. 
U.S.A. 75, 3727-3731), or the tac promoter (DeBoer, et al., 
1983, Proc. Natl. Acad. Sci. U.S.A. 80, 21-25); see also 
"Useful proteins from recombinant bacteria" in Scientific 
American, 1980, 242, 74-94; plant expression vectors 

15 comprising the nopaline synthetase promoter region (Herrera- . v . : 
Estrella et al. , Nature 303, 209-213) or the cauliflower 
mosaic virus 35S RNA promoter (Gardner, et al., 1981, Nucl. 
Acids Res. 9, 2871), and the promoter of the photosynthetic 
enzyme ribulose biphosphate carboxylase (Herrera-Estrella et 

20 al. , 1984, Nature 310, 115-120); promoter elements from yeast 
or other fungi such as the Gal 4 promoter, the ADC (alcohol 
dehydrogenase) promoter, PGK (phosphoglycerol kinase) 
promoter, alkaline phosphatase promoter, and the following 
animal transcriptional control regions, which exhibit tissue 

25 specificity and have been utilized in transgenic animals: 

elastase I gene control region which is active in pancreatic 
acinar cells (Swift et al., 1984, Cell 38, 639-646; Ornitz et 
al., 1986, Cold Spring Harbor Symp. Quant. Biol. 50, 399-409; 
MacDonald, 1987, Hepatology 7, 425-515); insulin gene control 

30 region which is active in pancreatic beta cells (Hanahan, 

.1985, Nature 315, 115-122) , immunoglobulin gene control region 
which is active in lymphoid cells (Grosschedl et al. , 1984, 
Cell 38, 647-658; Adames et al. , 1985, Nature 318, 533-538; 
Alexander et al., 1987, Mol. Cell. Biol. 7, 1436-1444), mouse 

35 mammary tumor virus control region* which is active in 

testicular, breast, lymphoid and mast cells (Leder et al., 
1986, Cell 45, 485-495), albumin gene control region which is 
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active in liver (Pinkert et al., 1987, Genes and Devel. 1, 
268-276) , alpha-fetoprotein gene control region which is 
active in liver (Krumlauf et al., 1985 # Mol. Cell. Biol. 5, 
1639-1648; Hammer et al. # 1987, Science 235, 53-58; alpha 1- 
5 antitrypsin gene control region which is active in the liver 
(Kelsey et al., 1987, Genes and Devel. 1, 161-171), beta- 
globin gene control region which is active in myeloid cells 
(Mogram et al., 1985, Nature 315, 338-340; Kollias et al., 
1986, Cell 46, 89-94; myelin basic protein gene control region 
10 which is active in oligodendrocyte cells in the brain 

(Readhead et al., 1987, Cell 48, 703-712); myosin light chain- 
2 gene control region which is active in skeletal muscle 
(Sani, 1985, Nature 314, 283-286), and gonadotropic releasing 
hormone gene control region which is active in the 
15 hypothalamus (Mason et al., 1986, Science 234, 1372-1378). 

Expression vectors containing inserts of genes 
encoding polypeptides comprising a functional domain can be 
identified by three general approaches: (a) nucleic acid 
hybridization, (b) presence or absence of "marker" gene 
2 0 functions, and (c) expression of inserted sequences. In the 
first approach, the presence of a foreign gene inserted in an 
expression vector can be detected by nucleic acid 
hybridization using probes comprising sequences that are 
homologous to the inserted gene. In the second approach, the 
25 recombinant vector/host system can be identified and selected 
based upon the presence or absence of certain "marker" gene 
functions (e.g., thymidine kinase activity, resistance to 
antibiotics, transformation phenotype, occlusion body 
formation in baculovirus, etc.) caused by the insertion of 
30 foreign genes in the vector. For example, if the gene 
encoding a polypeptide comprising a functional domain is 
inserted within the marker gene sequence of the vector, 
recombinants containing the gene can be identified by the 
absence of the marker gene function. In the third approach, 
35 recombinant expression vectors can' be identified by assaying 
the foreign gene product expressed by the recombinant. Such 
assays can be based, for example, on the physical or 
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functional properties of the gene product in in vitro assay 
systems, e.g., ability to bind to recognition units. 

Once a particular recombinant DNA molecule is 
identified and isolated, several methods known in the art may 
5 be used to propagate it. Once a suitable host system and 
growth conditions are established, recombinant expression 
vectors can be propagated and prepared in quantity. As 
previously explained, the expression vectors which can be used 
include, but are not limited to, the following vectors or 

10 their derivatives: human or animal viruses such as vaccinia 
virus or adenovirus; insect viruses such as baculovirus; yeast 
vectors; bacteriophage vectors (e.g., lambda), and plasmid and 
cosmid DNA vectors, to name but a few. 

In addition, a host cell strain may be chosen which 

15 modulates the expression of the inserted sequences, or 
modifies and processes the gene product in the specific 
fashion desired. Expression from certain promoters can be 
elevated in the presence of certain inducers; thus, expression 
of the protein may be controlled. Furthermore, different host 

20 cells have characteristic and specific mechanisms for the 
translational and post-translational processing and 
modification (e.g., glycosylation, cleavage) of proteins. 
Appropriate cell lines or host systems can be chosen to ensure 
the desired modification and processing of the foreign protein 

25 expressed. For example, expression in a bacterial system can 
be used to produce an unglycosylated core protein product. 
Expression in yeast will produce a glycosylated product. 
Expression in mammalian cells can be used to ensure "native" 
glycosylation of a heterologous protein. Furthermore, 

30 different vector/host expression systems may effect processing 
reactions such as proteolytic cleavages to different extents. 

In other specific embodiments, polypeptides 
comprising a functional domain, or fragments, analogs, or 
derivatives thereof may be expressed as a fusion, or chimeric 

35 protein product (comprising the polypeptide, fragment, analog, 
or derivative joined via a peptide bond to a heterologous 
protein sequence (of a different protein) ) . Such a chimeric 
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product can be made by ligating the appropriate nucleic acid 
sequences encoding the desired amino acid sequences to each 
other by methods known in the art, in the proper reading 
frame, and expressing the chimeric product by methods commonly 
5 known in the art. Alternatively, such a chimeric product may 
be made by protein synthetic techniques, e.g., by use of a 
peptide synthesizer. 

5.8.1 Identification and Purification of the 
10 Expressed Gene Product ; 

Once a recombinant which expresses the gene sequence 

encoding a polypeptide comprising a functional domain is 

identified, the gene product may be analyzed. This can be 

achieved by assays based on the physical or functional 

15 properties of the product, including radioactive labelling of 
the product followed by analysis by gel electrophoresis. 

Once the polypeptide comprising a functional domain 
is identified, it may be isolated and purified by standard 
methods including chromatography (e.g., ion exchange, 

20 af f init Y/ and sizing column chromatography) , centrif ugation, 
differential solubility, or by any other standard technique 
for the purification of proteins. The functional properties 
may be evaluated using any suitable assay, including, but not 
limited to, binding to a recognition unit. 

25 

5.9 Derivatives and Analogs of Polypeptides Comprising a 
Functional Domain 

The invention further provides derivatives 
(including but not limited to fragments) and analogs of 
polypeptides that are functionally active, e.g., comprising a 
functional domain. In a specific embodiment, the derivative 
or analog is functionally active, i.e., capable of exhibiting 
one or more functional activities associated with a full- 
length, wild-type polypeptide, e.g., binding to a recognition 
unit. As one example, such derivatives or analogs may have 
the antigenicity of the full-length polypeptide. 
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In particular, derivatives can be made by altering 
gene sequences encoding polypeptides comprising a functional 
domain by substitutions, additions or deletions that provide 
for functionally equivalent molecules. Due to the degeneracy 
5 of nucleotide coding sequences, other DNA sequences which 
encode substantially the same amino acid sequence as a gene 
encoding a polypeptide comprising a functional domain may be 
used in the practice of the present invention. These include 
but are not limited to nucleotide sequences comprising all or 

10 portions of such genes which are altered by the substitution 
of different codons that encode a functionally equivalent 
amino acid residue within the sequence, thus producing a 
silent change. Likewise, the derivatives of the invention 
include, but are not limited to, those containing, as a 

15 primary amino acid sequence, all or part of the amino acid 
sequence of a polypeptide comprising a functional domain 
including altered sequences in which functionally equivalent 
amino acid residues are substituted for residues within the 
sequence resulting in a silent change. For example, one or 

20 more amino acid residues within the sequence can be 

substituted by another amino acid of a similar polarity which 
acts as a functional equivalent, resulting in a silent 
alteration. Substitutes for an amino acid within the sequence 
may be selected from other members of the class to which the 

25 amino acid belongs. For example, the nonpolar (hydrophobic) 
amino acids include alanine, leucine, isoleucine, valine, 
proline, phenylalanine, tryptophan and methionine. The polar 
neutral amino acids include glycine, serine, threonine, 
cysteine, tyrosine, asparagine, and glutamine. The positively 

30 charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids 
include aspartic acid and glutamic acid. 

Derivatives or analogs of genes encoding 
polypeptides comprising a functional domain include but are 

35 not limited to those polypeptides which are substantially 
homologous to the genes or fragments thereof, or whose 
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encoding nucleic acid is capable of hybridizing to a nucleic 
acid sequence of the genes. 

The derivatives and analogs of the invention can be 
produced by various methods known in the art. The 
5 manipulations which result in their production can occur at 
the gene or protein level. For example, the cloned gene 
sequence can be modified by any of numerous strategies known 
in the art (Maniatis, T. , 1989, Molecular Cloning, A 
Laboratory Manual, 2d ed. , Cold Spring Harbor Laboratory, Cold 

10 Spring Harbor, New York) . The sequence can be cleaved at 
appropriate sites with restriction endonuclease(s) , followed 
by further enzymatic modification if desired, isolated, and 
ligated in vitro. PCR primers can be constructed so as to 
introduce desired sequence changes during PCR amplification of 

15 a nucleic acid encoding the desired polypeptide. In the 

production of the gene encoding a derivative or analog, care 
should be taken to ensure that the modified gene remains 
within the same translational reading frame, uninterrupted by 
translational stop signals, in the gene region where the 

20 desired activity is encoded. 

Additionally, the sequence of the genes encoding 
polypeptides comprising a functional domain can be mutated in 
vitro or in vivo, to create and/or destroy translation, 
initiation, and/or termination sequences, or to create 

25 variations in coding regions and/or form new restriction 

endonuclease sites or destroy preexisting ones, to facilitate 
further in vitro modification. Any technique for mutagenesis 
known in the art can be used, including but not limited to, in 
vitro site-directed mutagenesis (Hutchinson, C, et al., 1978, 

30 J. Biol. Chem 253:6551), use of TAB® linkers (Pharmacia), etc. 

Manipulations of the sequence may also be made at 
the protein level. Included within the scope of the invention 
are protein fragments or other derivatives or analogs which 
are differentially modified during or after translation, e.g., 

35 by glycosylation, acetylation, phosphorylation, amidation, 
derivatization by known protecting/blocking groups, 
proteolytic cleavage, linkage to an antibody molecule or other 
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cellular ligand, etc. Any of numerous chemical modifications 
may be carried out by known techniques, including but not 
limited to specific chemical cleavage by cyanogen bromide, 
trypsin, chymotrypsin, papain, V8 protease, NaBH 4 ; 
5 acetylation, formylation, oxidation, reduction; metabolic 
synthesis in the presence of tunicamycin; etc. 

In addition, analogs and derivatives can be 
chemically synthesized. For example, a peptide corresponding 
to a portion of a polypeptide comprising a functional domain 

10 can be synthesized by use of a peptide synthesizer. 

Furthermore, if desired, nonclassical amino acids or chemical 
amino acid analogs can be introduced as a substitution or 
addition into the sequence. Non-classical amino acids include 
but are not limited to the D- isomers of the common amino 

15 acids, a-amino isobutyric acid, 4-aminobutyric acid, 
hydroxyproline, sarcosine, citrulline, cysteic acid, t- 
butylglycine , t-butylalanine , phenylglycine , 

cyclohexylalanine, 0-alanine, designer amino acids such as fi- 
methyl amino acids, Ca-methyl amino acids, and Na-methyl amino 
20 acids. 

5.10 Antibodies to Polypeptides Comprising 

a Functional Domain 

According to one embodiment, the invention provides 

25 antibodies and fragments thereof containing the binding domain 

thereof, directed against polypeptides comprising a functional 

domain. Accordingly, polypeptides comprising a functional 

domain, fragments or analogs or derivatives thereof, in 

particular, may be used as immunogens to generate antibodies 

30 against such polypeptides, fragments or analogs or 

derivatives. Such antibodies can be polyclonal, monoclonal, 

chimeric, single chain, Fab fragments, or from an Fab 

expression library. In a specific embodiment, antibodies 

specific to the functional domain of a polypeptide comprising 

35 a functional domain may be prepared . 

Various procedures known in the art may be used for 

the production of polyclonal antibodies. In a particular 
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embodiment:, rabbit polyclonal antibodies to an epitope of a 
polypeptide comprising a functional domain, or a subsequence 
thereof, can be obtained. For the production of antibody, 
various host animals can be immunized by injection with the 
5 native polypeptide comprising a functional domain, or a 
synthetic version, or fragment thereof, including but not 
limited to rabbits, mice, rats, etc. Various adjuvants may be 
used to increase the immunological response, depending on the 
host species, and including but not limited to Freund's 

10 (complete and incomplete) , mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, 
pluronic polyols, polyanions, peptides, oil emulsions, keyhold 
limpet hemocyanins, dinitrophenol, and potentially useful 
human adjuvants such as BCG (bacille Calmette-Guerin) and 

15 corynebacterium parvum. 

For preparation of monoclonal antibodies, any 
technique which provides for the production of antibody 
molecules by continuous cell lines in culture may be used. 
For example, the hybridoma technique originally developed by 

20 Kohler and Milstein (1975, Nature 256, 495-497), as well as 
the trioma technique, the human B-cell hybridoma technique 
(Kozbor et al., 1983, Immunology Today 4, 72), and the EBV- 
hybridoma technique to produce human monoclonal antibodies 
(Cole et al., 1985, in Monoclonal Antibodies and Cancer 

25 Therapy, Alan R. Liss, Inc., pp. 77-96). 

Antibody fragments which contain the idiotype 
(binding domain) of the molecule can be generated by known 
techniques. For example, such fragments include but are not 
limited to: the F(ab') 2 fragment which can be produced by 

30 pepsin digestion of the antibody molecule; the Fab' fragments 
which can be generated, by reducing the disulfide bridges of 
the F(ab') 2 fragment, and the Fab fragments which can be 
generated by treating the antibody molecule with papain and a 
reducing agent. 

35 In the production of antibodies, screening for the 

desired antibody can be accomplished by techniques known in 
the art, e.g. ELISA (enzyme-linked immunosorbent assay). 
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6. EXAMPLES 

6.1. Identification of Genes from cDNA Expression 

Libraries : 

5 A study was initiated to determine whether peptide 

recognition units could recognize functional domains that are 

the same as or similar to their target functional domain but 

that are contained in proteins other than the protein 

containing their target functional domain. Such "functional" 

10 screens, using recognition units of relatively small size, 
were not previously known and were difficult to develop 
because of the I9W degree of sequence homology among 
functional domain-containing proteins. Thus, for example, an 
oligonucleotide probe could not be designed with any degree of 

15 confidence based on the low degree of homology of primary 
sequences of SH3 domains. 

Using SH3 domain-binding peptides from combinatorial 
peptide libraries as recognition units, we screened a series 
of mouse and human cDNA expression libraries. We found that 

2Q 69 of the 74 clones isolated from the libraries encoded at 
least one SH3 domain. These clones represent more than 18 
different SH3 domain-containing proteins, of which more than 
10 have not been described previously. 

The initial recognition unit chosen was a Src SH3 

25 domain-binding peptide (termed pSrcCII) isolated from a phage- 
displayed random peptide library (Sparks et al., 1994, J. 
Biol. Chem. 269:23853-23856). pSrcCII was (biotin- 
SGSGGILAPPVPPRNTR— NH 2 ) (SEQ ID NO:l). pSrcCII was synthesized 
by standard FMOC chemistry, purified by HPLC, and its 

30 structure was confirmed by mass spectrometry and amino acid 
analysis. To form multivalent complexes, 50 pmol biotinylated 
pSrcCII peptide was incubated with 2 /xg streptavidin-alkaline 
phosphatase (SA-AP) (for a biot in :biot in-binding site ratio of 
1:1). Excess biotin-binding sites were blocked by addition of 

35 500 pmol biotin. Alternatively, 31.2 /xl of 1 mg/ml SA-AP 
could have been incubated with 15 /xl of 0.1 mM biotinylated 
peptide for 30 min at 4 °C. Ten /xl of 0.1 mM biotin would 
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then be added, and the solution incubated for an additional 15 
min. 

A XEXIox mouse 16 day embryo cDNA expression library 
was obtained from Novagen (Madison, WI) . The cDNA library was 
5 screened according to published protocols (Young and Davis, 
1983, Proc. Natl. Acad. Sci. USA 80:1194-1198). The library 
was plated at an initial density of 30,000 plaques/100 mm 
petri plate as follows. A library aliquot was diluted 1:1000 
in SM (100 mM NaCl, 8 mM MgS0 4 , 50 mM Tris HC1 pH 7.5, 0.01% 
10 gelatin). Three jil of diluted phage were added to 1.5 ml each 
of SM, 10 mM Caci 2 /Mgci 2 , and an overnight culture of 
BL21(DE3)pLysE E. coli cells. BL21 overnight cultures were 
grown in 2xYT medium (1.6% tryptone, 1% yeast extract, and 
0.5% NaCl) supplemented with 10 mM Mgso 4 , 0.2 % maltose, and 
15 25/ig/ml chloramphenicol. This mixture was incubated 20 min at 
37 °C, after which 300 /xl were plated on each of 14 2xYT agar 
plates in 3 ml 0.8% 2xYT top agarose containing 25 ng/ml 
chloramphenicol. Plaques were allowed to form for 6 hours at 
37 °C, after which isopropyl-0-D-thiogalactopyranoside (IPTG)- 
20 soaked filters were applied. After an additional eight hours' 
incubation at 37 °C, the filters were marked, removed from the 
plates, and washed three times with phosphate buffered saline 
(PBS; 137 mM NaCl, 2.7 mM KC1, 4 . 3 mM Na 2 HP0 4/ 1.4 mM KH 2 P0 4 ) , 
0.1% Triton X-100. The filters were blocked for 1 hour in 
25 pbs, 2% bovine serum albumin (blocking solution) and 

subsequently incubated overnight at 4°C with fresh blocking 
solution plus streptavidin-alkaline phosphatase (SA-AP) 
complexed peptide. Approximately 1 fig SA-AP complexed with 
peptide in 1 ml blocking solution was used for each filter ; 
30 The filters were then subjected to four 15 minute washes with 
PBS, 0.1% Triton X-100. Bound SA-AP-peptide complexes were 
detected by incubation with 44 ml nitroblue tetrazolium 
chloride (NBT, 75 mg/ml in 70% dimethylf ormaroide) and 33 ml of 
5-bromo-4-chloro-3-indoyl-phosphate-p-toluidine salt (BCIP 50 
35 mg/ml in dimethylforroamide) in 10 ml of alkaline phosphatase 
buffer (0.1 M Tris-HCl, pH 9.4, 0.1 M NaCl, 50 mM MgCl 2 ) ; the 
signals were robust, often evident within a few minutes. 
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Positive plaques were cored with a Pasteur pipet and placed in 
1 ml SM with a drop of chloroform. Lambda phage particles are 
structurally resistant to chloroform, which serves as a 
bacteriocidal agent. These cores were allowed to diffuse into 
5 solution for at least 1 hr before subsequent platings. Phage 
from cores were plated in 100 /xl each of SM, 10 mM CaCl 2 /MgCl 2 , 
and an overnight culture of BL21 (DE3) pLySE cells. Phage 
were plated with the intention of reducing the number of 
plaque forming units (pfu) /plate by roughly a factor of 10 
10 with each screen (i.e., 3 x 10 4 in the primary screen, 3 X 10 3 
in the secondary, and so on) . This was accomplished by 
diluting cores 1:1000 and plating 1-10 /il/plate. Four screens 
were generally required to obtain isolated plaques. 

Plasmids were rescued from the \EXlox phage by cre- 
15 mediated excision in BM25.8 E. coli cells. For each clone, 5 
til of a 1:100 dilution of phage were added to a solution 
containing 100 fil SM and 100 fxl of an overnight culture of 
BM25.8 cells (grown in 2xYT media supplemented with 10 mM 
MgS0 4 , 0.2 % maltose, 34 fig /ml chloramphenicol, and 50 ng/ml 
20 kanamycin) . After 30 minutes at 37 °C, 100 fj.1 of this 

solution were spread on an LB amp agarose plate and incubated 
overnight at 37 °C. A single colony from each plate was used 
to inoculate 3 ml of 2xYT/amp and incubated overnight. 
Plasmid DNA was purified from the overnight culture using 
25 Promega Wizard Miniprep DNA purification kits (Promega, 
Madison, WI) , extracted with an equal volume of 
phenol/chloroform followed by chloroform alone, and ethanol 
precipitated. This plasmid DNA was used to transform 
chemical-competent DH5a cells. Three colonies from each 
30 transformation were used to inoculate 3 ml cultures; DNA was 
purified as described above. Approximately, 1/20 of each 
individually purified DNA sample from transformed cells was 
digested with EcoRl and Hindlll and examined by 
electrophoresis on a 1% agarose gel to determine insert size 
35 and DNA quality. One DNA prep for* each clone was either 

sequenced manually using the dideoxy method or by an automated 
technique that uses fluorescent dideoxynucleotide terminators. 
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The T7 -gene 10 primer located approximately 40 bp upstream of 
the EcoRl restriction site was used conveniently in both 
cases* 

Approximately 100 of 1X10 6 plaques in the primary 
5 screen of the \EXlox 16 day mouse embryo cDNA expression 
library exhibited significant pSrcCII-binding activity. 
Figure 5 is representative of filters from primary and 
tertiary screens. Of the eighteen positive clones that were 
isolated and sequenced, all were found to encode proteins with 

10 SH3 domains, although several clones appeared to be siblings 
or to originate from the same mRNA. Thus, the pSrcCII screen 
resulted in the identification of cDNAs encoding nine distinct 
SH3 domain-containing proteins (see Figure 9) . The sequences 
of these proteins were compared to the sequences in GenBank 

15 with the computer program BLAST. Three of these proteins 
corresponded to entries in GenBank. SH3P1 appears to be the 
murine homologue of p53bp2, a p53-binding protein, p53bp2 
(Iwabuchi et al., 1994, Proc. Natl. Acad. Sci. USA 91:6098- 
6102); SH3P6 resembles human MLN50, a gene amplified in some 

20 breast carcinomas (Tomasetto et al., 1995, Genomics 28:367- 
376) ; and SH3P5 is Cortactin, a protein implicated in 
cytoskeletal organization (Wu and Parsons, 1993, J. Cell Biol. 
120:1417-1426). Six of the clones did not match entries in 
GenBank, indicating that the present invention can be used to 

25 identify novel SH3 domain-containing proteins. Of these novel 
proteins, SH3P2 contains three ankyrin repeats and a proline- 
rich region flanking its SH3 domain; SH3P7 and SH3P9 contain 
sequences related to regions in the proteins drebrin (Ishikawa 
et al., 1994, J. Biol. Chem. 269:29928-29933) and amphiphysin 

30 (David et al., 1994, FEBS Lett. 351:73-79), respectively. 
Finally, the novel proteins SH3P4 and SH3P8, although not 
similar to any known proteins, are highly related (89% amino 
acid similarity) to one another. 

The present invention can be used as part of an 

35 iterative process in which a recognition unit is used to 

identify proteins containing functional domains which are, in 
turn, used to derive additional recognition units for 
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subsequent screens . For example , to define the binding 
specificity of these newly cloned SH3 domains , they can be 
overexpressed as glutathione S-transf erase (GST) -fusion 
proteins in bacteria , which, in turn, can be used to screen a 
5 random peptide library in order to obtain recognition units 
which, in turn, can be used to screen cDNA libraries in order 
to obtain still more novel proteins containing SH3 domains. 

The recognition unit binding preferences of two of 
the SH3 domains isolated in the pSrcCII screen described above 
10 (p53bp2 and Cortactin) have been described (Sparks et al., 

1996, Proc, Natl. Acad. Sci. USA 93:1540-1544. Each of these 
SH3 domains recognizes recognition unit motifs related to, yet 
distinct from, the pSrcCII sequence. We used a synthetic 
peptide (pCort) containing the Cortactin SH3 recognition unit 

15 motif to screen the mouse embryo cDNA expression library. 

pCort was (biotin-SGSGSRLTPQSKPPLPPKPSWVSR-NH 2 ) (SEQ ID NO: 2). 
pCort was prepared and complexed with SA-AP as above for 
pSrcCII. Screening of the mouse embryo library with pCort was 
done as above for pSrcCII. 

20 Twenty six clones, of varying signal strength, were 

isolated and twenty-one were found to encode SH3 domain 
containing proteins. The pCort screen yielded genes 
corresponding to nine distinct SH3 domain-containing proteins 
(see Figure 9), four of which corresponded to entries in 

25 GenBank. SH3P5 and SH3P6 are Cortactin and MLN50, discussed 
above; SH3P10 matched SPY75/HS1, a protein involved in IgE 
signaling (Fukamachi et al., 1994, J. Immunol. 152:642-652); 
and SH3P11 is Crk, an SH2 domain and SH3 domain-containing 
adaptor molecule (Knudsen et al., 1994, J. Biol. Chem. 

30 269:32781-32787). The five novel transcripts encode SH3P7, 
SH3P8, and SH3P9, discussed above; SH3P13, an additional 
member of the SH3P4/SH3P8 family; and SH3P12, a protein with 
three SH3 domains and a region sharing significant sequence 
similarity with the peptide hormone sorbin (Vagen-Descroiz M. 

35 et al., 1991, Eur. J. Biochem. 201:53-50). 

Interestingly, the output from the pCort screen only 
partially overlapped with that of the pSrcCII screen: four of 
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the nine SH3 -containing proteins isolated with pCort were not 
identified with pSrcCII. In addition, SH3P9, the protein 
identified most frequently (50%) in the pSrcCII screen was 
isolated at a much lower frequency (7%) with the pCort probe. 
5 Thus, different recognition units can be used to identify 
distinct sets of SH3 domains. 

In addition to possessing at least one SH3 domain, a 
prominent characteristic of the proteins identified in the 
pSrcCII and pCort screens is the position of the SH3 domain 

10 within the proteins: twelve of thirteen proteins possess SH3 
domains near their C-termini. Although pSrcCII binds well to 
the Src SH3 domain (Figure 8), Src (whose SH3 domain occurs 
near the N-terminus) was not identified in the pSrcCII screen. 
We suspect the bias was a consequence of the fact that the 

15 mouse embryo cDNA library was constructed using oligo-dT- 
primed cDNA. Alternatively, it may be that the mRNA used to 
prepare the library contained very little, or no, Src 
transcripts . 

A variant of the pSrcCII peptide (T12SRC.1) was used 

20 to probe a Xgt22a human prostate cancer cell line cDNA library 
primed with oligo-dT and a Xgtll human bone marrow library 
primed with random and oligo-dT primers. T12SRC.1 was 
(biotin— GILAPPVPPRNTR— NH 2 ) (SEQ ID NO: 3). T12SRC.1 was used 
in the initial screens together with the peptide T12SRC.4. 

25 T12SRC.4 was ( b i o t i n - VLKRPLP I PP VTR-NH 2 ) (SEQ ID NO: 4). The 
Xgt22a human prostate cancer cell line cDNA library was made 
from the LNCaP prostate cancer cell line by using standard 
methods, i.e., the Superscript Lambda system for cDNA 
synthesis and cloning (Bethesda Research Laboratories, 

30 Gaithersburg, MD) . The Xgtll human bone marrow cDNA 

expression library was obtained from Clonetch (Palo Alto, CA) . 
The human libraries were screened and positive clones isolated 
as described above for the mouse 16 day embryo cDNA library, 
except that cDNA inserts of the Xgtll and Xgt22a phage were 

35 amplified by PCR rather than being -rescued by cre-mediated 
excision. Of the 1.2X10 7 XcDNA clones screened from these 
libraries, 3 0 exhibited detectable pSrcCII-binding activity. 
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Analysis of the positive clones revealed that they each 
encoded at least one SH3 domain, and that they originated from 
a total of six different transcripts (Figure 9). Three of 
these encode proteins possessing non-C- terminal SH3 domains, 
5 indicating that the present invention can be used to identify 
active domains regardless of their position within a protein. 
Of the six proteins identified, only three matched GenBank 
entries. SH3P15 and SH3P16 are Fyn (Kawakami et al., 1988, 
Proc. Natl. Acad. Sci. USA 85:3870-3874 and Lyn (Yamanashi et 

X0 al;, 1987, Mol. Cell. Biol. 7:237-243), respectively, two Src- 
family members possessing SH3 domains with ligand preferences 
similar to that of the Src SH3 domain (Rickles, 1994, EMBO J. 
13:5598-5604); and SH3P14 appears to be the human homologue of 
murine H74, a protein of unknown function. The three 

15 remaining proteins did not match entries in GenBank and 
include the human homolog of SH3P9, described above, and 
SH3P17 and SH3P18, fragments of two related (85% amino acid 
similarity) adaptor-like proteins comprised of at least four 
and three SH3 domains, respectively. 

20 Examination of the primary sequences of the SH3 

domains identified in this work reveals several interesting 
features (see Figure 10). Positions important for ligand 
binding by the Src SH3 domain (Feng et al., 1994, Science 
266:1241-1247; Lescure et al., 1992, J. Mol. Biol. 228:387-94) 

25 and essential for SH3 function in Grb2/Sem5 are conserved 
(Clark et al., 1992, Nature 356:340-344). In addition, the 
two gaps in the sequence alignment shown in Figure 10 
correspond to regions of length variation observed among 
previously characterized SH3 domains. Surprisingly, the SH3 

30 domains identified in this work are not significantly more 
similar to one another than they are to other known SH3 
domains, with the exception of the mouse and human forms of 
SH3P9 and SH3P14 which are 100% and 83% identical, 
respectively. This result indicates that SH3 domains can vary 

35 widely in primary structure and still bind proline-rich 
peptide recognition units selectively. 
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6.1.1. Nucleotide and Corresponding Amino Acid 
Sequences of Genes Identified from cDNA 
Expression Libraries 

The nucleotide sequences of SH3P1, SH3P2, SH3P3, 
SH3P4, SH3P5, SH3P6, SH3P7, SH3P8, SH3P9, SH3P10, SH3P11, 
SH3P12, SH3P13, and SH3P14, the mouse genes identified by 
screening the 16 day mouse embryo cDNA expression library with 
the peptides pSrcII and pCort, are shown in Figures 18 , 20 , 
22, 24, 26, 28, 30, 32, 34, 38, 40, 42A and B, 44, and 46A and 
B, respectively. The corresponding amino acid sequences of 
the mouse genes SH3P1, SH3P2 , SH3P3, SH3P4, SH3P5, SH3P6, 
SH3P7, SH3P8, SH3P9, SH3P10, SH3P11, SH3P12, SH3P13, and 
SH3P14 are shown in Figures 19, 21, 23, 25, 27, 29, 31, 33, 
35, 39, 41, 43, 45, and 47, respectively. 

The nucleotide sequences of SH3P9, SH3P14, SH3P17, 
and SH3P18, human genes identified by screening the human bone 
marrow and human prostate cancer cDNA expression libraries 
with the peptide T12SRC.1, are shown in Figures 36, 48, 50, 
and 52, respectively. The corresponding amino acid sequences 
of the human genes SH3P9, SH3P14, SH3P17, and SH3P18 are shown 
in Figures 37, 49, 51, and 53, respectively. 

Two genes, SH3P9 and SH3P14, were isolated from both 
mouse and human libraries. 

The sequences of SH3P15 and SH3P16 are not shown. 
SH3P15 is Lyn and SH3P16 is Fyn. 

Figure 54 shows the nucleotide sequence of clone 55, 
a novel human gene identified and isolated from a human bone 
marrow cDNA library (described in Section 6.1) using as 
recognition units a mixture of T12SRC.4 and pCort (described 
in Section 6.1) and the methods described in Section 6.1. 

Figure 55 shows the amino acid sequence of clone 55. 

Figure 56 shows the nucleotide sequence of clone 56, 
a novel human gene identified and isolated from a human bone 
marrow cDNA library (described in Section 6.1) using as 
recognition units a mixture of T12SRC.4 and pCort (described 
in Section 6.1) and the methods described in Section 6.1. 

Figure 57 shows the amino acid sequence of clone 56. 
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Figure 58 A shows the nucleotide sequence from 
position 1-1720 and Figure 58B shows the nucleotide sequence 
from position 1720-2873 of clone 65 , a novel human gene 
identified and isolated from a human bone marrow cDNA library 
5 (described in Section 6.1) using as recognition units a 

mixture of P53BP2.Con and Nckl.Con3 and the methods described 
in Section 6.1. P53BP2.Con and Nckl.Con3 are peptides, the 
amino acid sequences of which are biotin-SFAAPARPPVPPRKSRPGG- 
NH 2 (SEQ ID NO: 201) and b i o t i n- S FS FPPLPP APGG -NH 2 (SEQ ID 
10 NO: 2 02), respectively. The sequences of P53BP2.Con and 

Nckl.Con3 are consensus sequences of recognition units that 
bind to the SH3 domains of p53bp2 and Nck # respectively. 

Figure 59 shows the amino acid sequence of clone 65. 
Figure 60 shows the nucleotide sequence of clone 34, 
15 a novel human gene identified and isolated from a human 

prostate cancer cDNA library (described in Section 6.1) using 
as recognition units a mixture of T12SRC.1 and T12SRC.4 
(described in Section 6.1) and the methods described in 
Section 6.1. 

20 Figures 61A and 61B show the amino acid sequence of 

clone 34. 

Figure 62 shows the nucleotide sequence of clone 41, 
a novel human gene identified and isolated from a human bone 
marrow cDNA library (described in Section 6.1) using as 

25 recognition units a mixture of PXXP.NCK. Sl/4 and 

PXXP.ABL.G1/2M and the methods described in Section 6.1. 
PXXP.NCK. Sl/4 and PXXP. ABL.G1/2M are peptides, the amino acid 
sequences of which are biotin-SRSLSEVSPKPPIRSVSLSR-NH 2 (SEQ ID 
NO: 222) and biotin-SRPPRWSPPPVPLPTSLDSR-NH 2 (SEQ ID NO: 223), 

30 respectively. PXXP.NCK. Sl/4 and PXXP. ABL. G1/2M bind to the 
SH3 domains of Nek and Abl, respectively 

Figures 63A and 63B show the amino acid sequence of 

clone 41. 

Figure 64 shows the nucleotide sequence of clone 53 , 
35 a novel human gene identified and isolated from a human 

prostate cancer cDNA library (described in Section 6.1) using 
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as recognition units a mixture of PXXP.NCK.S1/4 and 
PXXP.ABL.G1/2M and the methods described in Section 6.1. 

Figures 65A and 65B show the amino acid sequence of 

clone 53. 

5 Figures 66A and 66B show the nucleotide and amino 

acid sequence of clone 5, a novel human gene identified and 
isolated from a HELA cell cDNA library using as recognition 
units a mixture of T12SRC.1 and T12SRC.4 (described in Section 
6.1) and the methods described in Section 6.1. 

10 

6.2. Use of Peptides Resembling SH3 Domain Binding 
Sequences as Recognition Units ; 

We inspected a number of published amino acid 

sequences and identified proline-rich stretches of amino acids 

15 that resembled consensus SH3 domain binding sequences. 
Peptides comprising these proline-rich sequences were 
synthesized and tested by the methods of the present invention 
for their ability to specifically bind to the novel SH3 
domains described in Sections 6.1 and 6.1.1. Purified SH3 

2 0 doma i n *" conta i n ing clones were spotted on a lawn of Y1090 host 
cells, grown for an appropriate amount of time, and plaque 
filter lifts were screened with biotinylated peptides 
complexed with streptavidin-alkaline phosphatase as described 
in Section 6.1. 

25 The results are shown in Figures 12 and 13. As can 

be seen, in many cases the synthesized peptides were able to 
bind to the novel SH3 domains. This indicates that those 
synthesized peptides could have been used to identify those 
novel SH3 domains from sources of polypeptides. 

30 

6.3. Valency of Peptide Recognition Units Affects 
Specificity of Recognition Units 

6.3.1 Preconjugation of Peptide Recognition Units 
with Streptavidin-Alkaline Phosphatase 
Increases Affinity of the Recognition Units for 
35 Targets . 

As a preliminary test of the effect of the valency 
of peptide recognition units on the ability of those 
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recognition units to be used as probes to detect SH3 domains, 
biotinylated peptides that had been previously shown to bind 
the SH3 domains of either Src or Abl were tested for their 
ability to bind their respective SH3 domain when either 
5 preconjugated with streptavidin-alkaline phosphatase (SA-AP) 
or not so preconjugated. GST-SrcSH3 and GST-AblSH3 fusion 
proteins (produced as described in Sparks et al., 1994, J. 
Biol. Chem. 269:23853-23856) were resolved by 10% SDS-PAGE and 
transferred to an Immobilon D nylon membranes (Millipore, New 

10 Bedford, MA) . The membranes were incubated in blocking 

solution for 1 hr at 25 °C and then incubated overnight at 4 
°C with either biotinylated Src SH3 domain or biotinylated Abl 
SH3 domain binding peptides in either multivalent (SA-AP) or 
monovalent format. The filters were washed three times (15 

15 min each wash) in PBS/T and incubated with NBT and BCIP for , 
color development. See Section 6.1 for further details of the 
detection process. 

The results are shown in Figure 14. In panels A, 
the biotinylated peptides were preconjugated with SA-AP and 

20 then allowed to bind to the immobilized SH3 domains. 

Preconjugation was as described in Section 6.1. In panels B, 
the peptides were first allowed to bind to the immobilized SH3 
domains and then the bound peptides were detected by adding 
SA-AP. In both cases, color development was as in Section 

25 6.1. The sequences of the peptides used were: Biotin- 

SGSGGILAPPVPPRNTR (SEQ ID NO:l) for the Src specific peptide 
and Biotin-SGSGSRPPRWSPPPVPLPTSLDSR (SEQ ID NO: 41) for the Abl 
specific peptide. The results shown in Figure 14 demonstrate 
that preconjugation with SA-AP dramatically increases the 

30 strength of the signal detected. 

6.3.2. Preconjugation of Peptide Recognition Units 

with Streptavidin-Alkaline Phosphatase Results 
in Recognition of a Variety of SH3 Domains 

35 Two /xg of each of a panel of GST-SH3 domain fusion 

proteins were transferred to Immobilon D nylon membranes 
(Millipore, New Bedford, MA) using a dot-blot apparatus. 
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Biotinylated Src, Abl, or Cortactin SH3 domain-binding 
peptides were preconjugated to SA-AP and incubated with the 
filter; an alkline-phophatase driven color reaction was used 
to detect peptide binding. The panel of immobilized proteins 
5 was also reacted with a polyclonal anti-GST antibody 

(Pharmacia, Piscataway, NJ) . Sequences of the Src, Abl, and 
Cortact in-binding peptides were Biotin-SGSGVIiKRPIiPIPPVTR (SEQ 
ID NO: 42), Biotin-SGSGSRPPRWSPPPVPLPTSLDSR (SEQ ID NO: 41), and 
Biotin— SGSGSRLGEFSKPPIPQKPTWMSR (SEQ ID NO:43) , respectively. 

10 As can be seen from the results shown in Figure 15, 

the preconjugated biotinylated peptides recognized not only 
their original target SH3 domains, but related domains as 
well. The Src peptide recognized the SH3 domains of Yes and 
Cortactin as well as the SH3 domain of Src; the Abl peptide 

15 recognized the Cortactin SH3 domain as well as the Abl SH3 
domain; and the Cortactin peptide recognized Src, Yes, Abl, 
Crk, and the C terminal Grb2 SH3 domains as well as 
recognizing the Cortactin SH3 domain. 

The above experiment was performed utilizing SH3 

2 0 domains that had been immobilized on nylon membranes. The 
following demonstrates that preconjugation with streptavidin 
also permits peptide recognition units to recognize a variety 
of SH3 domains when those domains are immobilized in the wells 
of a microt iter plate. 

25 Five different peptide recognition units (pAbl, 

pPLC, pCrk, pSrcCI, pSrcCII) were tested in either multivalent 
or monovalent format for their ability to bind to seven 
different SH3 domains (Src, Abl, PLOy, Crk, Cortactin, Grb2N, 
Grb2C) in an ELISA. The sequences of these peptides were as 

30 follows: pAbl, SGSGSRPPRWSPPPVPLPTSLDSR (SEQ ID NO: 41); pPLC, 
SGSGSMPPPVPPRPPGTLGG (SEQ ID NO: 66); pCrk, 
SGSGNYVNAIiPPGPPLPAKNGG (SEQ ID NO: 67); pSrcCI , 

SGSGVLKRPLPIPPVTR (SEQ ID NO: 42); pSrcCII, SGSGG ILAPPVPPRNTR 
(SEQ ID NO:l). These peptides were biotinylated as in Section 
35 6.1. 

The SH3 domains were produced as GST-SH3 fusion 
proteins as described in Sparks et al., 1994, J. Biol. Chem. 
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269:23853-23856. Their purity and concentration were 
confirmed by SDS-PAGE and Bradford protein assays, 
respectively. The GST-SH3 fusion proteins were immobilized in 
the wells of microtiter plates as follows: Two micrograms of 
5 each GST-SH3 fusion protein were incubated in wells of a flat 
bottom enzyme linked immunoabsorbent assay (ELISA) microtiter 
plate (Costar, Cambridge, MA) in 100 mM NaHC0 3 for lhr 25 *C. 
One volume of SuperBlock blocking buffer (Pierce Chemical Co. , 
Rockford, IL) was added to each well and incubated for an 

10 additional 30 min. Plates were washed three times with 
PBS/0.1% Tween-20/0.1% bovine serum albumin (BSA) • 
Immobilized proteins were detected with SH3 domain-binding 
peptides in multivalent or monovalent formats using 
streptavidin-horseradish peroxidase (SA-HRP; Sigma Chemical 

15 Co., St; Louis, MO). For complexation of the biotinylated 

peptides and SA-HRP, peptide and SA-HRP concentrations were as 
described for SA-AP complexation in Section 6.1, but all 
incubations and washes were in PBS/ 0.1% Tween-2 0/0.1% BSA. 
Plates were washed five times before color imetric reaction and 

2 0 before the addition of SA-HRP (monovalent format) . The amount 
of bound SA-HRP was evaluated with the addition of 100 /il 
horseradish peroxidase substrate [2 ' , 2 ' -Azino-Bis, 3- 
Ethylbenzthiazoline-6-Sulfonic Acid (ABTS) , 0.05 % hydrogen 
peroxide, 50 mM sodium citrate, pH 5.0]. After 5-30 minutes 

25 of reaction time, the optical densities (OD) of the microtiter 
plate wells were measured with a microtiter plate scanner 
(Molecular Devices, Sunnyvale, CA) set for 4 05 nm wavelength. 
The results are shown in Figure 8 . From Figure -8 it can be 
seen that the tetravalent (complexed) peptides display both 

30 increased affinity and broadened specificity toward SH3 

targets. Binding of complexed peptides was, however, still 
restricted to SH3 domains; the complexes bind to neither GST 
(Figure 8) nor other unrelated proteins (data not shown) . 
Thus, precomplexation with SA-AP decreases the specificity of 

35 the peptide recognition units but does not make the peptides 
non-specific. Rather, the peptides, when precomplexed, 
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recognize a variety of SH3 domains in addition to their target 
domains. 

6.3.3. Preconjugation of Peptide Recognition Units 

with Streptavidin-Alkaline Phosphatase Results 
in Recognition of a Variety of Expressed cDNA 
Clones 

Lambda phage clones of genes containing a variety of 
SH3 domains were isolated from screens of a 16 day mouse 
embryo cDNA expression library (Novagen, Madison, WI) . For a 
description of the isolation of these cDNA clones, see Section 
6.1. Phage particles corresponding to individual lambda phage 
cDNA recombinants were spotted onto 2xYT-1.5 % agar petri 
plates onto which had been poured 3 ml of 2xYT-0.8 % agarose 
with 10X) fil of a BL21(DE3)pLysE E. coli culture grown 
overnight. After a 6 hr incubation at 37 °C, expression of 
the cDNA segments was induced with IPTG-soaked nitrocellulose 
filters. After overnight incubation, the expressed proteins 
had been transferred to the filters and the filters were then 
incubated with either biotinylated SH3 -domain binding peptides 
preconjugated to SA-AP or a monoclonal antibody recognizing 
the T7-Tag fusion peptide (aT7.10Mab; Novagen, Madison, WI) . 
This antibody was used as a positive control since it 
recognized an epitope expressed by all the clones (part of the 
010 leader sequence common to all \EXlox recombinants) . 
Sequences of pSrcI, pSrcII, Cortactin, and CaM (Calmodulin 
binding) peptides were Biotin-SGSGVLKRPLPIPPVTR (SEQ ID 
NO:42), B iot in— SGSGGI LAPPVPPRNTR (SEQ ID NO:l), Biotin- 
SGSGSRLGEFSKPPIPQKPTWMSR (SEQ ID NO: 43), and Biotin- 
STVPRWIEDSLRGGAARAQTRLASAK (SEQ ID NO: 44), respectively. 

The results are shown in Figure 16. From Figure 16 
it can be seen that precomplexation with SA-AP decreases the 
specificity of the peptide recognition units but does not make 
the peptides non-specific; none of the peptides react in a 
significant fashion with two negative control sequences, a- 
actinin and calmodulin (CaM) . Rather, the peptides, when 
precomplexed, recognize a variety of SH3 domain-containing 
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cDNA clones in addition to clones containing their target 
domains, 

6.4* Characterization of cDNA clone-encoded proteins 
5 6.4.1. Production of cDNA clone-encoded proteins 

Purified DNA from all positive cDNA clones (ca. 18- 
20 positive clones per recognition unit) was used to transform 
chemical-competent BL21 cells (Hanahan et al. , 1983, J. Mol. 
Biol. 166:557-580, the complete disclosure of which is 

10 incorporated by reference herein) . 

Colonies that appeared after growth overnight at 
37 °C on 2xYT agar plates containing 100 ng/ral ampicillin were 
used to inoculate 4 ml cultures of 2xYT/amp. After 7 hours of 
incubation at 37 °C with shaking, IPTG was added to each 

15 culture to a final concentration of 100 ^M. After an 

additional 2 hours of incubation, 1 ml of each culture was 
collected and centrifuged to pellet the cells. Cell pellets 
were resuspended in 400 fil Ix SDS/DTT loading buffer and 
boiled at 100 °C for 5 min. The resulting cell lysates were 

20 subjected to Sodium Dodecyl Sulf ate-Polyacrylamide Gel 

Electrophoresis (SDS-PAGE) on an 8% acrylamide gel. Gels were 
either Coomassie stained or transferred to Immobilon D 
membrane (Millipore) and blotted (Towbin et al., 1979, Proc. 
Natl. Acad. Sci. 76:4350-4354). 

25 

6.5. Materials Used in Sections 6.1, 6.2, 6.3.1, 6.3.2, 
6.3.3, and 6.4.1 



Blocking Solution 

Hepes (pH 8) 20 mM 

30 MgCl 2 5 mM 

KC1 1 mM 

Dithiothreitol 5 mM 

Milk Powder 5% w/v 

2xYT media (1L) 

Bacto tryptone 16 g 

Yeast Extract 10 g 

NaCl 5 g 

2xYT agar plates 
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2xYT 15 g agar/L 

2xYT top agarose (8%) 
2xYT + 8 g agarose/L 

SDS/DTT loading buffer 

(10 mil of 5x solution) 

.5 M Tris base 0.61 g 

8.5% SDS 0.85 g 

27.5% sucrose 2.75 g 

100 mM DTT 0.154 g 

.03% Bromophenol Blue 3.0 mg 

Overnight cell cultures: 
Inoculate media with one isolated 
colony of appropriate ceil type and 
incubate 37 °C O/N with shaking 

BL21 (DE3) pLysE 

2xYT media 

maltose 0.2% 
15 MgS0 4 10 mM 

Chloramphenicol 25 jig/mL 

BM25.8 

2xYT media 

maltose 0.2% 

MgS0 4 10 mM 

20 Chloramphenicol 34 /xg/ml 

Kanamycin 50 jig/ ml 

6.6. Other Functional Domains and Recognition Units 

In a manner similar to that described above for SH3 
25 domains, recognition units directed to other functional 
domains of interest can be chosen for use in the present 
method. For example, as recognition units for a study of GST 
functional domains, the following GST-binding peptides can be 
used to screen a plurality of polypeptides: Class I CWSEWDGNEC 
3Q (SEQ ID NO:46), CGQWADDGYC (SEQ ID NO:47), CEOWDGYGAC (SEQ ID 
NO:48), CWPFWDGSTC (SEQ ID NO:49), CMIWPDGEEC (SEQ ID NO:50), 
CESOWDGYDC (SEQ ID NO: 51), CQQWKEDGWC (SEQ ID NO: 52), or 
CLYOWDGYEC (SEQ ID NO: 53); Class II - CMGDNLGDDC (SEQ ID 
NO:54), CMGDSLGOSC (SEQ ID NO:55), CMDDDLGKGC (SEQ ID NO:56), 
35 CMGENLGWSC (SEQ ID NO: 57), or CLGESLGWMC (SEQ ID NO: 58). 

Moreover, the following SH2 -binding peptides can be 
used according to the methods of the present invention to 
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identify SH2 domain-containing polypeptides: GDGYEEISP (SEQ ID 
NO:59) (for Src family), GDGYDEPSP (SEQ ID NO: 60) (for Nek) , 
GDGYDHPSP (SEQ ID NO: 61) (for Crk) , GDGYVIPSP (SEQ ID NO: 62) 
(PLOyN) , GDGYQNYSP (SEQ ID NO: 63) (for PLOyC) , GDGYMAMSP (SEQ 
5 ID NO: 64) (for p85PI3KN and p85PI3KC) , or GDGQNYSP (SEQ ID 
NO:65) (for Grb2). See, Yang, Cell 72:767-778, the complete 
disclosure of which is incorporated by reference herein. 

Further, polypeptides with a "PH" functional domain 
(analogous to the proteins Vav, Bcr, Msos, PLCS, Atk, or 
10 Pleckstrin) can be identified using PH-binding peptides, such 
as those described by Mayer et al.. Cell 73:629-630, the 
complete disclosure of which is incorporated by reference 
herein* 

Other recognition units can be readily contemplated, 
15 including other synthetic, semisynthetic, or naturally derive<i 
molecules. 

The present invention is not to be limited in scope 
by the specific embodiments described herein. Indeed, various 
modifications of the invention in addition to those described 

20 herein will become apparent to those skilled in the art from 
the foregoing description and accompanying figures. Such 
modifications are intended to fall within the scope of the 
appended claims. 

Various publications are cited herein, the 

25 disclosures of which are incorporated by reference in their 
entireties. 
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